

## Introduction

#### Goals

Develop services that leverage partially reconfigurable (PR) FPGAs for high performance embedded reconfigurable computing (RC) systems

#### **Motivations**

**Configuration Prefetching and Configuration Reuse Details** 



Single and multiple PRR systems

PR FPGAs enable preemption/resumption of hardware (HW) tasks in PR regions (PRRs) without losing tasks' execution state PR FPGAs enable HW multitasking over shared resources Time to reconfigure a PRR delays HW task execution Reconfiguration time can be reduced/hidden using:

- Configuration prefetching
- Configuration reuse
- Prior works only provide partial solutions, and no physical implementation

### Approach

- Leverage capabilities of PR FPGAs
- Implement services with portability across different FPGA architectures

### Accomplishments

Novel implementation of configuration prefetching and reuse for preemptive HW multitasking on a Virtex-5 FPGA



# **Configuration Prefetching and Reuse on PR FPGAs**

### **Overview**

Configuration prefetching and configuration reuse reduce the time to reconfigure a PRR on any PR system

## Approach

- Leverage ICAP and bitstream manipulations
- Use internal GSR signal and protection/unprotection mechanism for static region and PRRs
  - Protection: avoids GSR reinitialization of flip-flops and BRAMs
  - Unprotection: allows GSR reinitialization of flip-flops and BRAMs

### **Benefits**

Description Prefetching: PRR reconfiguration overlaps HW task execution over the same PRR w/o affecting execution of current HW task Reuse: No PRR reconfiguration needed, if preempted HW task needs to resume last execution

#### **Execution time to reconfigure PRR (T\_{prr})**



#### **Execution time for context save (T\_{save})** 40.0 35.0 30.0 Tsave\_ov 25.0 20.0 Tsave\_cpu 15.0 10.0 Tsave\_icap 5.0 0.0

#### PR system with a MicroBlaze softcore processor

- Executes Linux OS
- Executes a software application that orchestrates CS, CR, configuration prefetching
- and reuse
- $\Box T_{prr}$ : linear growth rate Depends on PRR size
  - $T_{prr_icap}$ ,  $T_{prr_cpu}$ , and  $T_{prr_ov}$ are the ICAP, CPU, and overhead execution times
- $\Box T_{save}$ : linear growth rate Depends on the number of flip-flops and BRAMs used in the HW task Includes unprotection and

#### No tool flow changes needed

Fundamentals can be extended to newer device families (Series-7, Zyng 7000, UltraScale)

#### Experiments

- □ Testbed: Virtex5 LX110T, 100 MHz, one PRR, OpenSPARC board, embedded Linux OS
- PRRs: implement two HW tasks, with CLBs and BRAMs Static region: MicroBlaze, Ethernet interface, FSLs, ICAP, GPIOs, DDR2 SDRAM, Compact Flash interface



- **FPGA Field Programmable Gate Array** PRR – Partially Reconfigurable Region CLB – Configurable Logic Block
- **ICAP** Internal Configuration Access Port
- Global Set and Reset
- FSL Fast Simplex Link
- **GPIO** General Purpose Input/Output

160 320 480 640 800 960 1120 1280 1440 1600 1760

HW task flip-flops

#### **Execution time for context restore (T\_{rest})**



protection of PRR •  $T_{save_icap}$ ,  $T_{save_cpu}$ , and  $T_{save_{ov}}$ , are the ICAP, CPU, and overhead execution times

- $\Box T_{rest}$ : linear growth rate Depends on PRR size
- Includes unprotection and protection of the PRR
- T<sub>rest\_icap</sub>, T<sub>rest\_cpu</sub>, and  $T_{rest\_ov}$ , are the ICAP, CPU, and overhead execution times