# Real Chip Evaluation of a Low Power CGRA with Optimized Application Mapping

<u>Takuya Kojima</u>, Naoki Ando,

Yusuke Matsushita, Hayate Okuhara,

Nguyen Anh Vu Doan and Hideharu Amano

Keio University, Japan

International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies (HEART2018), Toronto, Canada

## Outline

- Introduction
- A CGRA Architecture
- Three Types of Control
  - 1. Pipeline Structure Control
  - 2. Body Bias Control
  - 3. Application Mapping
- New Mapping Optimization Method
  - Real Chip Implementation
- Experimental Results
- Conclusion

#### Importance of Low Power Consumption

- Forthcoming
   IoT devices
   Wearable computing
   Sensor network
- Challenges
   High performance
   For image processing
  - Low Power Consumption
    - For long battery life





#### SF-CGRAs: Straight-Forward **Coarse-Grained Reconfigurable Arrays**



Key features of straight-forward CGRAs

- Limited data flow direction ■ Less frequent reconfiguration
  - Pipelined PE array ■ High energy efficiency ₄

## VPCMA: Variable Pipelined Cool Mega Array [1]



[1] N.Ando, et al. "Variable pipeline structure for Coarse Grained Reconfigurable Array CMA." <sup>5</sup> *Field-Programmable Technology*, 2016.

## **Pipeline Structure Control**



| Number of Pipeline Stage           |       |       |  |  |
|------------------------------------|-------|-------|--|--|
|                                    | Large | Small |  |  |
| Operating Frequency                |       |       |  |  |
| Throughput                         |       |       |  |  |
| Glitch Propagation                 |       |       |  |  |
| Dynamic Power of Registers & Clock |       |       |  |  |

## **Pipeline Structure Control**



| Num | ber of | Pipe | line S | Stage |
|-----|--------|------|--------|-------|
|     |        |      |        |       |

|                                    | Large | Small |  |
|------------------------------------|-------|-------|--|
| Operating Frequency                |       |       |  |
| Throughput                         |       |       |  |
| Glitch Propagation                 |       |       |  |
| Dynamic Power of Registers & Clock |       |       |  |

7

## Body Bias Effects on SOTB



#### SOTB Technology

■ 65 nm

- One of FD-SOI
- Body Biasing



## **Row-level Body Bias Control**

Delay Time of PE for Each Opcode





## How to map an application to the PE array?



An app. is represented as a data flow graph (DFG)
 Various Mappings exist

# How to map an application to the PE array?



An app. is represented as a data flow graph (DFG)
 Various Mappings exist

#### Complexity of Mapping Optimization



## Related work

- 1. Performance & power optimization for CGRA[2]
  - Considering VDD control
  - Optimization Priority: Performace > Power
- 2. Body bias domain size exploration for CGRAs[3]
  - Analysis of area overhead and power reduction effects
  - Not taking care of the dynamic power
- 3. Pipeline & body bias optimization for CGRAs [4]
  - Method using integer-linear-program
  - Assuming static mapping

[2] Gu, Jiangyuan, et al. "Energy-aware loops mapping on multi-vdd CGRAs without performance degradation." *Design Automation Conference (ASP-DAC), 2017 22nd Asia and South Pacific*. IEEE, 2017.
[3] Y.Matsushita, "Body Bias Grain Size Exploration for a Coarse Grained Reconfigurable Accelerator", Proc. of the 26th The International Conference on Field-Programmable Logic and Applications (FPL),2016.
[4] T. Kojima, *et al.* "Optimization of body biasing for variable pipelined coarse-grained reconfigurable architectures". IEICE Transactions on Information and Systems, Vol. E101-D,No. 6, June 2018.

## Is optimizing only the power consumption enough?

#### Several requirements

- Power Consumption
- Performance (Operating Frequency)

■ Throughput

Multi-Objective Optimization brings users
 A variety of choices
 Balancing the tradeoffs



## Proposal: Use Multi-Objective Optimization

- Non-dominated Sorting Genetic Algorithm-II (NSGA-II)
  - Multi-Objective Genetic Algorithm
- In this work
  - ■1-point crossover
  - Commonly-used probability [5]
    - 0.7 crossover probability
    - 0.3 mutation probability
  - ■300 generations



[5] L. Davis. "Adapting operator probabilities in genetic algorithms". In Proceedings of the third international conference on Genetic algorithms, pp. 61–69, San Francisco, CA, USA, 1989. <sup>15</sup> Morgan Kaufmann Publishers Inc.

#### Gene & Evaluation of Individuals



#### Gene & Evaluation of Individuals



#### Gene & Evaluation of Individuals



#### An Implemented Real Chip "CCSOTB2"



CCSOTB2

- VPCMA Architecture
- SOTB 65nm Technology
- 5 Body Bias Domains
- Design: Verilog HDL
- Synthesis: Synopsys Design Compiler
- Place & Route: Synopsys IC Compiler

| <b>Body Bias Domains</b> |               |  |  |
|--------------------------|---------------|--|--|
| domain1                  | 1-5th PE Rows |  |  |
| domain2                  | 6th PE Row    |  |  |
| domain3                  | 7th PE Row    |  |  |
| domain4                  | 8th PE Row    |  |  |
| domain5                  | other parts   |  |  |

## Preliminary Experiments



- Leak power of PE row is measured
   BBV: -0.8 ~ +0.4 V (step: 0.2 V)
- Maximum Operating Freq.

■ 30MHz

due to bottleneck in μ-controller



**Experimental Environment** 

## **Benchmark Applications**

| Name  | Description         |
|-------|---------------------|
| af    | 24bit alpha blender |
| gray  | 24bit gray scale    |
| sepia | 8bit sepia filter   |
| sf    | 24 bit sepia filter |

4 simple image processing applicationAssuming 30MHz frequency

#### Proposed method vs. Black-Diamond

#### Black-Diamond [7]

- does not support pipeline control nor body bias control
- Static mapping regardless of user's requirements
- Combine with pipeline optimization[6]
   Considering glitch effects

[6] T.Kojima, et al. "Glitch-aware variable pipeline optimization for CGRAs".

ReConFig2017, pp. 1–6, Dec 2017.

[7] V.Tunbunheng , *et al.* "Black-diamond: a retargetable compiler using graph with configuration bits for dynamically reconfigurable architectures". In Proc. of The 14th SASIMI, pp. 412–419, 2007. 22

# Mapping quality



23

# Mapping quality



Black-Diamond with pipeline optimization

-0.2 V OR 0.0 V 0.0 V SR SL ADD ADD SR MULT MULT MULT ADD 0.0 V AND AND SR MULT AND AND SR SR

Proposed method

#### **Difference of mapping results (af application)**

#### Power reduction



## Conclusion

- A new optimization method based on a multiobjective genetic algorithm is proposed
- Three controls are considered simultaneously
  - 1. Pipeline structure control
  - 2. Body bias control
  - 3. Application mapping
- Real chip experiments shows 14.2% power reduction

## End of presentation Thank you for your attention

Any questions?