# Glitch-aware Variable Pipeline Optimization for CGRAs Takuya Kojima, Naoki Ando, Hayate Okuhara and Hideharu Amano, Keio University, Japan

### Introduction

CGRA (Coarse-Grained Reconfigurable Array) is a type of platform proposed as accelerators for the forthcoming IoT and wearable computers. Especially, pipelined CGRAs can achieve high energy efficiency with control of their pipeline structure according to performance requirements. On the other hand, an increase of the dynamic power caused by glitch propagation can happen depending on the pipeline configuration. In this work, a dynamic power model considering glitch effects and an optimization method using it are proposed. Results of real chip measurements show that the optimized pipeline structures can achieve smaller energy consumption than fixed pipeline structures.

## Pipelined CGRA

### VPCMA (Variable Pipelined Cool Mega Array)



- <u>PE (Processing Element)</u>
- No register file
- No clock tree
- **PE** Array
- 12 cols  $\times$  8 rows of PEs

### Negative impact of glitch propagation

- > PE unification with Bypassing registers
- Feasible when the total delay time of PEs < clock cycle
- Propagating undesirable switchings (glitches) to next PEs
- Increase of PE dynamic power

#### **VPCMA** diagram



VPCMA real chip

- Static configuration
- 7 pipeline registers
- Variable Pipeline Structure
- 2 modes of pipeline register
- Latch mode
- Bypass mode
- Tradeoff between power and performance
- $\geq \mu$ -controller
- Data transfer control between data memory and PE array
- Real Chip of VPCMA
- **Operating experience**
- Achieving 2400 MOPS with 3.4mW
- Other pipelined CGRA
- PipeRench, XPP, S5 Engine, EGRA, DT-CGRA



Number of pipeline stage

- Power due to glitch propagation
- Accounting for up to 80% of total power





- Small pipeline stage = Many unified PEs
- Higher than linear increase
- Optimization of pipeline structure with a glitchaware power model is necessary
  - Related work: glitch propagation models for FPGA [1][2][3]

## Model considering glitches

> Dynamic power of PE array :  $P_{dyn} = E_{sw} S_{total} f$ •  $E_{sw}$ : Energy consumption per a swithing ,  $S_{total}$ : Total switching counts, f: Frequency  $\rightarrow$  A model to evaluate the swithing count is needed

- Classification of switching
- Necessary switching for computation Depending on
- Glitches generated within a single PE mapped operations
- Glitches due to propagation 3.

-----> Depending on the switching of previous PE

 $\succ S_{total} = \sum_{i=0}^{n} \sum_{j=0}^{m} S_{PE}(i,j)$ 

• *n*: number of rows, *m*: number of cols, *i*: Index of row, *j*: Index of col,  $S_{PE}(i, j)$ : switching count of a PE at *i*-th row and *j*-th col

 $> S_{PE}(i,j) = S_{single}(op) + \beta \gamma^{length} \max_{dir} S_{prev}(dir)$ 

## Accuracy of the proposed model

> Measuring the power of all pipeline structure

**Relative mean** 

- > 5 kinds of test bench
- > Obtaining parameters ( $E_{sw}$ ,  $\beta$ ,  $\gamma$ )
- > With least-square method







#### error(%) 80 80 Proposed Model Post Layout Simulation 60 dct at gray Application

#### *Power optimization*





> Minimizing  $P_{total} = E_{sw} S_{total} f + P_{reg,clk} \times N_{reg} + P_{leak}$ proposed model power of registers leakage

> Constraints: Critical path delay  $\leq$  Maximum allowed delay

Compared to

>Fixed pipeline structure >1, 2, 4 and 8 stage

> Results

>Smaller power consumption

> Guarantee of operation



Reference: [1] C.Lei, et al. DAC'07, pp.318-323 [2] S.Cromar, et al. DAC'09, pp.838-843 [3] G.Altaf, et al. FPT'06, pp.349-352