|  |  |  |
| --- | --- | --- |
| PROF. AKKARY | DEPT. OF ELECTRICAL AND COMPUTER ENGINEERING | December 19, 2011 |
|  | AMERICAN UNIVERSITY OF BEIRUT |  |
|  | **EECE421–COMPUTER Architecture** |  |
|  | **Quiz 2 – Fall 2011** |  |
|  |  |  |
| **NAME**: \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_ |  |  |
| **ID**: \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_ |  |  |

**INSTRUCTIONS:**

* **The duration of the exam is 120 minutes (2 hours).No time extension.**
* **The exam is closed-book/closed-notes.**
* **Carrying Cell phones is not allowed in the examination room.**
* **Write your name and ID. NumBer in the space provided above.**
* **Circle only one answer.**
* **READ THE QUESTIONS CAREFULLY BEFORE ANSWERING.**
* **in some questions, more than one choice may be a valid answer. Circle the best choice you think is the most appropriate answer to the question.**
* **There is no penalty for wrong answers.**
* **Use the back pages for scratch if needed**
* **Check that you have a total of 5 pages.**
* **No questions are allowed.**
* **You cannot leave the exam room for any reason until you complete the exam.**
1. A bimodal predictor is initialized with a value of 1 in every state machine entry. For a branch execution sequence of 001100, where 0 indicates taken and 1 indicates not taken, Which of the following statements is **TRUE**:
	1. There will be one misprediction
	2. There will be two mispredictions
	3. There will be three mispredictions
	4. There will be four mispredictions
	5. There will be five mispredictions
2. A branch in a gshare predictor has a current state of value 01. If the branch is incorrectly predicted not-taken, the next state will be:
	1. 00
	2. 01
	3. 10
	4. 11
	5. Not enough information
3. A branch in a bimodal predictor has a current state of value 00. If the branch is incorrectly predicted taken, the next state will be:
	1. 00
	2. 01
	3. 10
	4. 11
	5. Not enough information
4. Which statement is **FALSE:**
	1. A branch target buffer is used in branch prediction
	2. A branch target buffer is organized like a cache
	3. A branch target buffer is used to predict targets of all branches
	4. A branch target buffer is used when a branch is fetched
	5. A branch target buffer stores PC values as tags
5. A local correlating branch predictor uses 8K state machines. Which of the following statements is **TRUE?**
	1. It uses 10 bits of branch history
	2. It uses 11 bits of branch history
	3. It uses 12 bits of branch history
	4. None of the above
	5. Not enough information is given
6. A gshare branch predictor uses 7 history bits and 7 address bits. The size of the state machines array is:
	1. 128 entries
	2. 512 entries
	3. 2K entries
	4. 4K entries
	5. 8K entries
	6. 16K entries
7. A global branch predictor uses 12 bits of branch history and has an array of 4K state machines. Which of the following statements is **TRUE?**
	1. It uses 12 PC bits
	2. It uses all of the PC bits
	3. It does not use any of the PC bits
	4. None of the above
	5. Not enough information
8. A combined bimodal-gshare branch predictor uses 12 bits of branch history and 12 bits of branch PC. Which of the following statements is **TRUE?**
	1. It has a total of 1K state machines
	2. It has a total of 2K state machines
	3. It has a total of 4K state machines
	4. It has a total of 8K state machines
	5. It has a total of more than 8K state machines
9. Consider the following loop:

for (i = 1000; i > 0; i--)

 x[i] = x[i] + y[i];

Assuming that x and y are floating point arrays, and that the addresses of entries x[0], x[1000], y[0] and y[1000] are in registers R1, R2, R3 and R4 respectively. Assume a floating point add requires 3 execution cycles. Also assume a MIPS scalar (1-wide) processor that implements register bypassing and delayed branch, with no branch prediction and with branches executed in the EU pipeline stage.

An optimizing compiler that does not perform loop unrolling will achieve an execution rate of:

1. 5 cycles per iteration
2. 6 cycles per iteration
3. 7 cycles per iteration
4. 8 cycles per iteration
5. 9 cycles per iteration
6. Which of the following statements is **TRUE**:
	1. VLIW processors do not incur pipeline stalls between instructions.
	2. VLIW processors do not perform branch prediction.
	3. VLIW processors do not check for exceptions.
	4. VLIW processors do not execute instructions out-of-order
	5. None of the above is true.
7. Which of the following statements is **False**:
	1. In Predicated execution, the hardware computes the predicate registers, and the hardware writes back the results of the instructions with true predicate values and discards the results of instructions with false predicate values.
	2. Predicated execution increases instruction size.
	3. Predicated execution eliminates some branches.
	4. Predicated execution gives better performance than branch prediction since it avoids some branch mispredictions.
	5. Predicated execution is a hardware only ILP technique.
8. Choose the pair of terms that are NOT related:
	1. Predicated execution and branches
	2. Stop bits and memory dependences
	3. Stop bits and register dependences
	4. Data speculation and memory dependences
	5. Control speculation and branches
9. If an EPIC compiler uses control speculation to move the load instruction above the branch in the following code segment.

Br R1, R5

Ld R3, (R4)

Add R5, R3, #1

The number of instructions in the compiled code segment will be:

* 1. 3 instructions
	2. 4 instructions
	3. 5 instructions
	4. 6 instructions
	5. 7 instructions

1. A compiler can safely move the Add before the branch in the following instruction sequence.

BEQ R1, R0, Label

Add R2, R2, #1

Ld R3, (R2)

Label: Ld R2, (R3)

* 1. True
	2. False
1. VLIW consumes less total power executing a program than superscalars because it uses less complex hardware, therefore less logic gates.
	1. True
	2. False
2. Which of the following statements about hardware methods for exposing ILP is **true?**
	1. Hardware ILP methods are less common on modern processors because of their complexity.
	2. Hardware ILP methods exploit instruction level parallelism between loop iterations, which can also be done with loop unrolling.
	3. Hardware ILP methods make debugging programs very difficult due to out-of-order execution and imprecise exceptions.
	4. Unlike hardware ILP methods, all software ILP methods Increase code size
	5. Hardware ILP methods are less effective in handling cache misses since they cannot schedule loads ahead of earlier stores as can be done by the Itanium compiler.
3. The main benefit of loop unrolling is to reduce loop overhead.
	1. True
	2. False
4. The degree of loop unrolling by a compiler is limited by the code size increase and the instruction cache capacity, but not by registers.
	1. True
	2. False
5. Choose the one **TRUE** statement about write-through cache.
	1. On a store miss, the store is written to memory, but only if the cache is not write-allocate.
	2. On a store hit, the store is written to memory, but only if the cache is not write-allocate.
	3. On a store hit, the store is written to memory, but only if the cache is write-allocate.
	4. A store is written to memory regardless if the store hit or miss the cache.
6. Main use of a write buffer with a cache is to reduce store misses.
	1. True
	2. False
7. Main use of a victim buffer with a cache is to reduce conflict misses.
	1. True
	2. False
8. A cache has a total capacity of 64K bytes. It is implemented as 4-way set associative, with block size of 64 bytes. The physical address on the machine consists of 32 bits.

Which of the following statements is **TRUE**?

* 1. Number of set index bits = 7 and number of tag bits = 20
	2. Number of set index bits = 8 and number of tag bits = 19
	3. Number of set index bits = 9 and number of tag bits = 18
	4. Number of set index bits = 8 and number of tag bits = 18
	5. None of the above
1. A cache has a total capacity of 4K bytes. It is implemented as a fully set associative cache, with block size of 64 bytes. The physical address on the machine consists of 40 bits.

Which of the following statements is **TRUE**?

* 1. Number of tag bits = 33
	2. Number of tag bits = 34
	3. Number of block offset bits = 5
	4. Number of set index bits = 34
	5. None of the above
1. Consider the following sequence of address references issued by a CPU to a cache.

0xFFFF0108, 0xFFFFC100, 0X1FFF7100, 0XFFFF4100, 0xFFFFC104, 0xFFFF4108, 0xFFFFC100

Assuming a 2-way set associative cache with block size of 16 bytes, total capacity of 8Kbytes and LRU replacement policy. Which of these statements is **TRUE**?

1. There will be 6 compulsory misses and 1 hit
2. There will be 3 compulsory misses, 1 conflict miss and 3 hits
3. There will be 4 compulsory misses, 2 conflict misses and 1 hit
4. There will be 4 compulsory misses, 1 conflict miss and 2 hits
5. There will be 4 compulsory misses and 3 conflict misses
6. Cache A has a larger number of sets than cache B. Therefore, Cache A has less capacity misses.
	1. True
	2. False