







## Limits to ILP HW Model comparison

10

|                                  | Model    | Power 5                            |
|----------------------------------|----------|------------------------------------|
| Instructions Issued<br>per clock | Infinite | 4                                  |
| Instruction Window<br>Size       | Infinite | 200                                |
| Renaming<br>Registers            | Infinite | 48 integer +<br>40 Fl. Pt.         |
| Branch Prediction                | Perfect  | 2% to 6%<br>misprediction          |
|                                  |          | (Tournament<br>Branch Predictor)   |
| Cache                            | Perfect  | 64KI, 32KD, 1.92MB<br>L2, 36 MB L3 |
| Memory Alias<br>Analysis         | Perfect  | ??                                 |



|                                   | New<br>Model                  | Model    | Power 5                            |
|-----------------------------------|-------------------------------|----------|------------------------------------|
| Instructions<br>Issued per clock  | Infinite                      | Infinite | 4                                  |
| (Vary) Instruction<br>Window Size | Infinite, 2K,<br>512, 128, 32 | Infinite | 200                                |
| Renaming<br>Registers             | Infinite                      | Infinite | 48 integer +<br>40 Fl. Pt.         |
| Branch Perfe<br>Prediction        | Perfect                       | Perfect  | 2% to 6%<br>misprediction          |
|                                   |                               |          | (Tournament Branch<br>Predictor)   |
| Cache                             | Perfect                       | Perfect  | 64KI, 32KD, 1.92MB<br>L2, 36 MB L3 |
| Memory Alias                      | Perfect                       | Perfect  | ??                                 |



|                                             | New Model                                         | Model      | Power 5                                         |
|---------------------------------------------|---------------------------------------------------|------------|-------------------------------------------------|
| Instructions<br>Issued per<br>clock         | 64                                                | Infinite   | 4                                               |
| Instruction<br>Window Size                  | 2048                                              | Infinite   | 200                                             |
| Renaming<br>Registers                       | Infinite                                          | Infinite   | 48 integer +<br>40 Fl. Pt.                      |
| <mark>(Vary)</mark><br>Branch<br>Prediction | Perfect vs. 8K<br>Tournament vs.<br>512 2-bit vs. | Perfect    | 2% to 6%<br>misprediction<br>(Tournament Branch |
| profile vs. none                            |                                                   | Predictor) |                                                 |
| Cache                                       | Perfect                                           | Perfect    | 64KI, 32KD, 1.92MB<br>L2, 36 MB L3              |
| Memory<br>Alias                             | Perfect                                           | Perfect    | ??                                              |





Γ

|                                              | New Model                             | Model    | Power 5                            |
|----------------------------------------------|---------------------------------------|----------|------------------------------------|
| Instructions<br>Issued per<br>clock          | 64                                    | Infinite | 4                                  |
| Instruction<br>Window Size                   | 2048                                  | Infinite | 200                                |
| <mark>(Vary)</mark><br>Renaming<br>Registers | Infinite v. 256,<br>128, 64, 32, none | Infinite | 48 integer +<br>40 Fl. Pt.         |
| Branch<br>Prediction                         | 8K 2-bit                              | Perfect  | Tournament Branch<br>Predictor     |
| Cache                                        | Perfect                               | Perfect  | 64KI, 32KD, 1.92MB<br>L2, 36 MB L3 |
| Memory<br>Alias                              | Perfect                               | Perfect  | Perfect                            |



|                                        | New Model                                 | Model    | Power 5                            |
|----------------------------------------|-------------------------------------------|----------|------------------------------------|
| Instructions<br>Issued per<br>clock    | 64                                        | Infinite | 4                                  |
| Instruction<br>Window Size             | 2048                                      | Infinite | 200                                |
| Renaming<br>Registers                  | 256 Int + 256 FP                          | Infinite | 48 integer +<br>40 Fl. Pt.         |
| Branch<br>Prediction                   | 8K 2-bit                                  | Perfect  | Tournament                         |
| Cache                                  | Perfect                                   | Perfect  | 64KI, 32KD, 1.92MB<br>L2, 36 MB L3 |
| <mark>(Vary)</mark><br>Memory<br>Alias | Perfect v. Stack<br>v. Inspect v.<br>none | Perfect  | Perfect                            |



|                                     | New Model                        | Model    | Power 5                            |
|-------------------------------------|----------------------------------|----------|------------------------------------|
| Instructions<br>Issued per<br>clock | 64 (no<br>restrictions)          | Infinite | 4                                  |
| Instruction<br>Window Size          | Infinite vs. 256,<br>128, 64, 32 | Infinite | 200                                |
| Renaming<br>Registers               | 64 Int + 64 FP                   | Infinite | 48 integer +<br>40 Fl. Pt.         |
| Branch<br>Prediction                | 1K 2-bit                         | Perfect  | Tournament                         |
| Cache                               | Perfect                          | Perfect  | 64KI, 32KD, 1.92MB<br>L2, 36 MB L3 |
| Memory<br>Alias                     | HW<br>disambiguation             | Perfect  | Perfect                            |





### Which is better for increasing ILP: HW vs. SW

Memory disambiguation:

- HW best
- Compile time pointer analysis is hard

Speculation:

HW best when dynamic branch prediction better than compile time prediction

 $(\mathcal{D})$ 

21

- » Profiling is not good enough
- Exceptions easier for HW
- » HW doesn't need bookkeeping code or compensation code
- Speculation is very complicated to get right
- » Execution is hard enough to get right without speculation » Speculation leads to many special cases
- » Speculation leads to many special c » Hard to get right
- Scheduling
  - SW can look ahead to schedule better, look beyond current PC
- Advantage for HW based:
- Compiler independence: does not require new compiler, 9/26/07 recompilation to run well



#### **Thread Level Parallelism (TLP) Outline** ILP vs. TLP • ILP exploits implicit parallel operations within a loop or straight-line code segment Multithreading TLP explicitly represented by the use of multiple Simultaneous Multithreading threads of execution that are inherently parallel TLP Goal: Use multiple instruction streams to improve - Throughput of computers that run many programs Execution time of multi-threaded programs TLP could be more cost-effective to • exploit than ILP 9/26/07 23 9/26/07

NOW Handout Page (#)

24





















- Not affecting clock cycle time, especially in
  Instruction issue more candidate instructions need to be
  - considered - Instruction completion - choosing which instructions to commit may be challenging
- Ensuring that cache and TLB conflicts generated by SMT do not degrade performance

9/26/07



# NOW Handout Page (#)

35