|  |  |  |
| --- | --- | --- |
| PROF. AKKARY | DEPT. OF ELECTRICAL AND COMPUTER ENGINEERING | January 3, 2011 |
|  | AMERICAN UNIVERSITY OF BEIRUT |  |
|  | **EECE 421 – COMPUTER Architecture** |  |
|  | **Quiz 3 – Fall 2010** |  |
|  |  |  |
| **NAME**: \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_ |  |  |
| **ID**: \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_ |  |  |

**INSTRUCTIONS:**

* **The duration of the exam is TWO hours. No time extension.**
* **The exam is closed-book/closed-notes.**
* **Using Cell phones is not allowed in the examination room.**
* **Write your name and ID. NumBer in the space provided above.**
* **Circle only one answer.**
* **READ THE QUESTIONS CAREFULLY BEFORE ANSWERING.**
* **in some questions, more than one choice may be a valid answer. Circle the best choice you think is the most appropriate answer to the question.**
* **ALL QUESTIONS ARE EQUALLY WEIGHTED.**
* **There is no penalty for wrong answers.**
* **Use the back pages for scratch if needed**
* **Check that you have a total of 5 pages.**
* **No questions are allowed.**
* **You cannot leave the exam room for any reason until you complete the exam.**
1. Which of the following statements about software methods for exposing ILP is **FALSE**:
	1. The compiler is used to improve the performance of pipelines.
	2. Uses static scheduling instead of dynamic scheduling.
	3. Can provide better performance than hardware techniques, but only if the programmer is involved in exposing the ILP by writing parallel programs.
	4. Optimizing compilers make memory write-after-write and write-after read hazards more probable.
	5. The compiler is used to improve the performance of simple multiple-issue processors.
2. Which of the following statements about hardware methods for exposing ILP is **true?**
	1. They are less common on modern processors because of their complexity.
	2. They exploit instruction level parallelism between loop iterations, which can also be done with loop unrolling.
	3. Makes debugging programs very difficult due to out-of-order execution and imprecise exceptions.
	4. All software ILP methods have equivalent hardware ILP methods.
	5. They are less effective in handling cache misses since they cannot schedule loads ahead of earlier stores as can be done by the Itanium compiler.
3. Which of the following statements is **TRUE**:
	1. The reorder buffer is a buffer for instruction results.
	2. Most microprocessors that feature out-of-order execution use a future file.
	3. A history buffer provides better performance than a reorder buffer.
	4. Instructions read operands and/or tags from the reorder buffer in program order.
	5. Instructions write results to the reorder buffer in-order.
4. A processor provides support for data spceculation. The compiler decides to move the Ld instruction to the beginning of the following code segment.

BEQ R0, R2, L1

Add R2, R2, #200

L1: Ld R3, 100(R1)

 The number of instructions in the compiled code will be:

* 1. 3 instructions
	2. 4 instructions
	3. 5 instructions
	4. None of the above.
	5. It could be any of the above, depending on the following instructions below the segment.
1. Which of the following statements is **TRUE**:
	1. Tomqsulo’s algorithm avoids write-after-write hazards by register renaming.
	2. Tomasulo’s algorithm avoids write-after-write hazards by stalling at the decode stage.
	3. Tomasulo’s algorithm avoids write-after-write hazards by using a reorder buffer.
	4. Tomasulo’s algorithm avoids write-after-write hazards by writing results into the register file in-order.
	5. Tomasulo’s algorithm avoids write-after-write hazards using tag matching in the RF.
2. Which of the following statements is **TRUE**:
	1. The Copy Propagation optimization is useful for software pipelining.
	2. The Copy Propagation optimization is useful for predicated execution.
	3. The Copy propagation optimization is useful for loop unrolling .
	4. None of the above.
	5. All of the above.
3. Choose the one **FALSE** statement about recurrence software optimization.

* 1. It may be useful when the value of an expression in one loop iteration is a function of a previous iteration.
	2. It is useful only for expressions that perform a sum function.
	3. It is useful only when performing loop unrolling.
	4. It is useful only with processors that can perform multiple operations in the same cycle.
	5. It is not useful for software pipelining.
1. Which of the following is **TRUE**?
	1. The most significant factor in the performance of a VLIW processor is its compiler.
	2. NOPs are needed in VLIW programs only if its compiler uses global scheduling.
	3. NOPs impact VLIW processor power but not performance.
	4. All of the above are true.
2. The main benefit of loop unrolling is to reduce loop overhead.
	1. True
	2. False
3. The degree of loop unrolling by a compiler is limited by the code size increase and the instruction cache capacity, but not by registers.
	1. True
	2. False
4. Loop unrolling without static scheduling reduces performance due to code size increase.
	1. True
	2. False
5. VLIW and Itanium architectures differ in how they access register operands, but not memory operands.
	1. True
	2. False
6. Choose the most accurate statement:
	1. There is one main reason for code size increase in VLIW processors.
	2. There are 2 main reasons for code size increase in VLIW processors.
	3. There are 3 main reasons for code size increase in VLIW processors.
	4. Code size increase in VLIW processors occurs only if a trace scheduling compiler is used.
7. Which of the following statements is **TRUE**:
	1. VLIW processor hardware does not check for dependences between instructions.
	2. VLIW processor hardware does not perform branch prediction.
	3. VLIW processor hardware does not check for exceptions.
	4. None of the above is true.
8. Which of the following statements is **FALSE**:
	1. Pentium Pro performed register renaming by assigning different registers in a register alias table to instructions that write the same destination register.
	2. Pentium Pro converted all CISC instructions to RISC-like micro-ops.
	3. Pentium Pro was 3-wide superscalar.
	4. Pentium Pro used one centralized set of reservation stations.
	5. Pentium Pro used a BTB for branch prediction.
9. Which of the following statements is **False**:
	1. In predicated execution processors the compiler computes the predicate registers, and the hardware writes back the results of the instructions with true predicate value and discards the results of instructions with false predicate value.
	2. Predicated execution increases code size.
	3. Predicated execution eliminates some branches.
	4. Predicated execution gives better performance than branch prediction since it avoids some branch mispredictions.
10. A local correlating branch predictor uses 12 history bits and 10 address bits. The size of the state machines array is:
	1. 512 entries
	2. 1K entries
	3. 2K entries
	4. 4K entries
	5. None of the above
11. Choose the pair of terms that are related:
	1. Control speculation and memory disambiguation
	2. Stop bits and binary compatibility
	3. Data speculation and exceptions
	4. Predicated execution and VLIW processors
	5. None of the above
12. Choose the pair of terms that are related:
	1. Trace compiler and VLIW processors
	2. Trace compiler and software branch prediction
	3. Trace compiler and Amdahl’s law
	4. All of the above
	5. None of the above
13. An EPIC compiler uses data speculation to move the load instruction and the instructions that use the load data above the store in the following code segment.

Store R1, (R2)

Ld R3, (R4)

Add R5, R3, #1

The number of instructions in the compiled code segment will be:

* 1. 3 instructions
	2. 4 instructions
	3. 5 instructions
	4. 6 instructions
	5. 7 instructions
1. A compiler can safely reorder the following instruction sequence.

BEQ R1, R0, Label

Ld R2, 100(R1)

* 1. True
	2. False
1. EPIC architecture does not provide precise exception state since loads that are moved ahead of branches could cause exceptions.
	1. True
	2. False
2. VLIW consumes less power than superscalars because it uses less complex hardware, therefore less logic gates.
	1. True
	2. False
3. Software pipelining does not help performance of superscalar and VLIW architectures.
	1. True
	2. False
4. A Software pipelining may not help performance of superscalar and VLIW architectures as much as loop unrolling.
	1. True
	2. False