Introduction to FPGA Circuits

Arnaud Tisserand
CNRS, IRISA laboratory, CAIRN research team
École ARCHI
Col-de-Porte, Isère
March 25–29th 2013

Part I
Introduction

Part II
FPGA Elements

Part III
Processors in FPGAs

Part IV
References

Software versus Hardware Implementation

<table>
<thead>
<tr>
<th></th>
<th>SW</th>
<th>HW</th>
</tr>
</thead>
<tbody>
<tr>
<td>EXCELLENT</td>
<td>slow</td>
<td>large</td>
</tr>
<tr>
<td>FLEXIBILITY</td>
<td>limited</td>
<td>fast</td>
</tr>
<tr>
<td>SPEED</td>
<td>small</td>
<td>small</td>
</tr>
<tr>
<td>AREA</td>
<td>small</td>
<td>small</td>
</tr>
<tr>
<td>ENERGY</td>
<td>small</td>
<td>huge</td>
</tr>
<tr>
<td>DEVEL. COST</td>
<td></td>
<td>huge</td>
</tr>
</tbody>
</table>

A. Tisserand, CNRS–IRISA–CAIRN. Introduction to FPGA Circuits
Implementations Targets

- **GPP:** general purpose processor
  - Intel Core i3-5-7 & Xeon, AMD Athlon & Opteron
  - ARM Cortex Ax, Cell, Power...

- **ASP:** application specific processor
  - DSP digital signal processors: C60000, MSC81xx...
  - network proc., security proc., power manager...

- **μcontroller**
  - 8051, AVR, ColdFire, MSP, PIC...

- **FPGA**
  - full-custom circuits
    - customize all elements and layers
  - standard-cell circuits
    - functions = cells from library, only "draw" connexions
  - gate-arrays circuits
    - 1) predefined but unconnected transistors (active elem.)
    - 2) connections using top metals

### PLA: Programmable Logic Array

- User programmable device for combinational logic (197x)
- Sum-of-product canonical form
- Crossing planes of wires before configuration
- Configuration (programming): (un)set (un)wanted connections
- Inputs \( x_i \) (\( \overline{x}_i \)), outputs \( y_j \)

\[
\begin{align*}
p_1 &= \overline{x}_1 \cdot x_2 \\
p_2 &= x_2 \cdot x_3 \\
p_3 &= x_1 \cdot x_3 \\
p_4 &= \overline{x}_1 \cdot x_2 \cdot x_3 \\
y_1 &= \overline{x}_1 \cdot x_2 + x_2 \cdot x_3 + x_1 \cdot x_3 \\
y_2 &= x_2 \cdot x_3 + \overline{x}_1 \cdot x_2 \cdot x_3
\end{align*}
\]

AND plane

OR plane

NORs / NANDs are used in practice for CMOS circuits.

Terminology

<table>
<thead>
<tr>
<th>acronym</th>
<th>type</th>
<th>prog.</th>
</tr>
</thead>
<tbody>
<tr>
<td>ASIC</td>
<td>application-specific integrated circuit</td>
<td>N</td>
</tr>
<tr>
<td>MPG A</td>
<td>masked programmable logic array</td>
<td>Y</td>
</tr>
<tr>
<td>PROM</td>
<td>programmable read-only memory</td>
<td>Y</td>
</tr>
<tr>
<td>EPROM</td>
<td>erasable PROM</td>
<td>Y</td>
</tr>
<tr>
<td>EEPROM</td>
<td>electrically EPROM</td>
<td>Y</td>
</tr>
<tr>
<td>PAL</td>
<td>programmable array logic</td>
<td>Y</td>
</tr>
<tr>
<td>PLA</td>
<td>programmable logic array</td>
<td>Y</td>
</tr>
<tr>
<td>GAL</td>
<td>generic array logic(^1)</td>
<td>Y</td>
</tr>
<tr>
<td>PLD</td>
<td>programmable logic device</td>
<td>Y</td>
</tr>
<tr>
<td>EPLD</td>
<td>erasable PLD</td>
<td>Y</td>
</tr>
<tr>
<td>SPLD</td>
<td>simple PLD</td>
<td>Y</td>
</tr>
<tr>
<td>CPLD</td>
<td>complex PLD</td>
<td>Y</td>
</tr>
<tr>
<td>FPGA</td>
<td>field-programmable gate array</td>
<td>Y</td>
</tr>
</tbody>
</table>

\(^1\) GALs ≠ GALS (globally asynchronous locally synchronous)
From Gate-Arrays to FPGAs

End of book:
Bob Hartmann, Paul Newhagen and Michael Magranet
Gate Arrays: Implementing LSI Technology, 1982

"The probabilities are high that someone will produce an electrically alterable logic array."

June 3rd, 1983:
Foundation of the Altera society by
Bob Hartmann\textsuperscript{1}, Paul Newhagen\textsuperscript{1}, Michael Magranet\textsuperscript{1},
Jim Sansburry\textsuperscript{2} and Jim Hazle\textsuperscript{1}

\textsc{Source: “Altera: A History of Innovation” from Altera website}

\textsuperscript{1}previously at Fairchild Semiconductor, \textsuperscript{2}previously at HP.

A. Tisserand, CNRS–IRISA–CAIRN. Introduction to FPGA Circuits

First Commercial FPGA

1984:
\textbf{Xilinx} society founded by Ross Freeman and Bernard Vonderschmitt

1985: XC2064
- CMOS 2.5 \( \mu \text{m} \) (Seiko), 85 kT
- \( \approx 1000 \) gates
- 64 CLBs, 122 FFs, LUT3
- 58 I/O (68-pin PLCC package)
- 18 MHz ext. crystal oscillator
- Config.: 12 038 bits

\textsc{Source: IEEE SSC Mag. Vol. 3 No. 4 2011 p. 18 & Xilinx Data Sheet}

FPGAs Application Domains

- ASIC Prototyping
- Audio, video & image processing
- Automotive
- Aviation
- Consumer Electronics
- Industrial
- Test & measures
- Networks, wired and wireless communications
- Data centers, high-performance computing and storage
- Medical
- Security and defense
- Aerospace
- . . .
According to NASA's Jet Propulsion Laboratory in Pasadena, California, the Spirit Mars Exploration Rover (MER) launched June 10, 2003 and the Opportunity MER launched July 7, 2003 will employ some of the most advanced radiation tolerant Xilinx Virtex FPGAs once they reach Mars. The Xilinx devices will be used to control the pyrotechnic devices on the lander, and several motor control functions on the rover, including controllers for the wheels, steering, and antenna gimbals.

Chosen because of their re-programmability and density, the Virtex FPGAs serve as the 'main brain' of the motor control boards."

See also: http://www.xilinx.com/publications/archives/xcell/Xcell50.pdf

Configuration Cell Technology: Anti-Fuse (1/2)
Principle:

Example:

- 15-20 Å nitride-oxide (NO) dielectric between 2 polysilicon layers
- 5.5 V circuit supply voltage
- 13.6 V programming voltage
- $0.8 \times 0.8 \, \mu m$ cell footprint


Configuration Cell Technology: Antifuse (2/2)

Cross section of antifuses in ACTEL (now Microsemi) FPGAs:

Unprogrammed
Programmed
Configuration Cell Technology: FLASH (1/2)

- Electrons can be trapped in the FG
- Threshold voltage ($V_T$) depends on the charge in the FG
- READ: apply intermediate voltage on the CG, then sense channel
- STORE: inject electrons in the FG
- ERASE: remove electrons from the FG (tunneling)


A. Tisserand, CNRS–IRISA–CAIRN. Introduction to FPGA Circuits

Configuration Cell Technology: FLASH (2/2)

FLASH cell evolution for data storage (not always applicable to FPGAs)


A. Tisserand, CNRS–IRISA–CAIRN. Introduction to FPGA Circuits

Configuration Cell Technology: SRAM

1-bit static RAM cell (std. CMOS techno.) for each programmable element


A. Tisserand, CNRS–IRISA–CAIRN. Introduction to FPGA Circuits
Overview of IC Production Economics (1/2)

\[ C = \frac{F}{N} + V \]

- \( C \) cost per circuit
- \( F \) fixed costs \( F = F_{\text{NRE}} + F_{\text{other}} \)
  - \( F_{\text{NRE}} \) non-recurring engineering costs: prototyping, masks, packaging, tooling, personnel costs, training, support, CAD tools, computers, . . .
  - \( F_{\text{other}} \) all other fixed costs: documentation, marketing, administration, after-sales, . . .
- \( N \) number of circuits to sell
- \( V \) variable cost per circuit \( V = V_{\text{process}} + V_{\text{packaging}} + V_{\text{test}} \)
  - \( V_{\text{process}} \) cost for producing one die
  - \( V_{\text{packaging}} \) package and “transformations” costs
  - \( V_{\text{test}} \) cf. specific course (depends on complexity and duration)

Economic Motivations for FPGAs vs ASICs (1/4)

Time to market (TTM):

- ASIC
- FPGA

Economic Motivations for FPGAs vs ASICs (2/4)

Early arrival on the market:

- FPGA
- ASIC

Longer product life due to reconfigurability:

A. Tisserand, CNRS–IRISA–CAIRN. Introduction to FPGA Circuits
Economic Motivations for FPGAs vs ASICs (3/4)

FPGA domain vs ASIC domain (arbitrary scales):

```
<table>
<thead>
<tr>
<th></th>
<th>total cost</th>
</tr>
</thead>
<tbody>
<tr>
<td>&quot;old&quot; FPGA</td>
<td>ASIC &quot;old&quot;</td>
</tr>
<tr>
<td>&quot;new&quot; FPGA</td>
<td>&quot;new&quot; ASIC</td>
</tr>
</tbody>
</table>
```

```
<table>
<thead>
<tr>
<th></th>
<th>volume</th>
</tr>
</thead>
<tbody>
<tr>
<td>n</td>
<td>n + m</td>
</tr>
<tr>
<td>n + 2m</td>
<td></td>
</tr>
</tbody>
</table>
```

ASIC advantages:
- ultra high performances and very low power
- low unit cost
- small form factor

FPGA advantages:
- simple design and verification cycles
- no NRE costs
- low (re)design risk
- fast time to market
- reconfiguration (flexible systems)

Economic Aspects: Altera Corp. Values for 2012 (1/2)

Net sales: 1 783 035 $  ≈ 2 600 employees worldwide

Main markets:
- Telecom & Wireless
- Industrial Automation, Military & Automotive
- Networking, Computer & Storage
- Other

Geographic sales distribution:
- Asia Pacific
- Europe, Middle East and Africa
- Americas
- Japan

Economic Aspects: Altera Corp. Values for 2012 (2/2)

Products categories:
- 84% FPGA
- 9% CPLD
- 7% Other
- 32% New (Stratix IV/V, Arria II/V, Cyclone IV/V, …)
- 30% Mainstream (Stratix III, Cyclone III, …)
- 38% Mature and Other

Source: Altera website http://investor.altera.com/, News Release
Economic Aspects: Xilinx Corp. Values for 2012 (1/2)

Net sales: \$2,240,700 ≈ 3,400 employees worldwide

Main markets:
- 47% Communications & Data Center
- 36% Industrial & A & D
- 15% Broadcast, Consumer & Auto
- 2% Other

Geographic sales distribution:
- 34% Asia Pacific
- 32% North America
- 25% Europe
- 10% Japan

Source: Xilinx website http://investor.xilinx.com/releases.cfm, Investor Factsheet

A. Tisserand, CNRS-IRISA-CAIRN. Introduction to FPGA Circuits

Economic Aspects: Xilinx Corp. Values for 2012 (2/2)

Mainstream

New

Base

Support

Source: Xilinx website http://investor.xilinx.com/releases.cfm, Investor Factsheet

FPGA Companies

Achronix Semiconductor www.achronix.com
Altera www.altera.com
Lattice Semiconductor www.latticesemi.com
Microsemi (previously Actel) www.microsemi.com
QuickLogic www.quicklogic.com
Tabula www.tabula.com
Xilinx www.xilinx.com

FPGA Companies Revenues over 1999–2012

Source: FPGA companies websites
Reconfigurable Architectures

- **Fine grain** reconfigurable architectures:
  - Def.: reconfiguration at the bit/signal/gate level
  - Pros.: huge flexibility
  - Cons.: high configuration cost (area, time)
  - Examples: FPGAs

- **Coarse grain** reconfigurable architectures
  - Def.: reconfiguration at the function/bloc level:
  - Pros.: small configuration cost
  - Cons.: limited flexibility
  - Examples: Tensilica
  - Examples: DART (IRISA), Systolic Ring (LIRMM), . . .

---

**Logic Block**

Resources:
- input and output pins
- configurable logic function generator(s) (FG)
  - $f$ is an arbitrary function of the inputs
- flip-flop(s) some intermediate registers and pipelining
- configurable internal selection and routing between resources

---

**Look Up Tables (LUT)**

- Programmable memory cells: configuration of the truth table for all possible values
- Address bits: selection of the truth table line
- Typical LUT types: 1 or 2 outputs, 3 to 6 inputs
One Possible Implementation of a LUT-3

Examples of LUT-4 Configurations

<table>
<thead>
<tr>
<th>(a_3)</th>
<th>(a_2)</th>
<th>(a_1)</th>
<th>(a_0)</th>
<th>(f)</th>
</tr>
</thead>
<tbody>
<tr>
<td>0 0 0 0</td>
<td>0 0 0 0</td>
<td>0 0 0 0</td>
<td>0 0 0 0</td>
<td>1 0 0 0</td>
</tr>
<tr>
<td>1 1 1 1</td>
<td>1 1 1 1</td>
<td>1 1 1 1</td>
<td>1 1 1 1</td>
<td>0 0 0 0</td>
</tr>
<tr>
<td>1 1 1 1</td>
<td>1 1 1 1</td>
<td>1 1 1 1</td>
<td>1 1 1 1</td>
<td>0 0 1 0</td>
</tr>
<tr>
<td>1 0 1 0</td>
<td>1 0 1 0</td>
<td>1 0 1 0</td>
<td>1 0 1 0</td>
<td>1 0 1 0</td>
</tr>
</tbody>
</table>

Actel ACT1 and ACT3 Logic Blocks

Source: Xilinx data sheet XC 2064/2018 Logic Cell Array p. 2.64

Source: Actel data sheet
**Carry Propagation Problem**

Parallel prefix adders:

Building high-speed adders is very costly using flexible logic blocks and general routing resources.

Most of applications use many adders.

**Carry Propagation Solution**

Add dedicated resources for high-speed addition.

There is a tradeoff between performances, cost and flexibility.

**Old Quicklogic Logic Block**

**Xilinx XC3000 Logic Block**

**Source:** Quicklogic data sheet

**Source:** Xilinx data sheet

A. Tisserand, CNRS–IRISA–CAIRN. Introduction to FPGA Circuits
Carry Logic in a Xilinx XC 4000 E

Xilinx Spartan II E Configurable Logic Block

Shift Registers in Logic Blocks

Towards Larger LUTs
Adaptive Logic Module in a Altera Stratix V

Xilinx Virtex 7 Configurable Logic Block (1/3)

Resources per CLB:

<table>
<thead>
<tr>
<th></th>
<th>Slices</th>
<th>LUT-6</th>
<th>Flip-Flops</th>
<th>Arithmetic &amp; Carry Chains</th>
<th>DRAM</th>
<th>Shift Register</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>2</td>
<td>8</td>
<td>16</td>
<td>2</td>
<td>256 bits</td>
<td>128 bits</td>
</tr>
</tbody>
</table>

Xilinx Virtex 7 Configurable Logic Block (2/3)

Xilinx Virtex 7 Configurable Logic Block (3/3)

SLICEM

SLICEL
Evolution of Xilinx CLBs

Xilinx XC 2000

LUT

D

Q

Xilinx XC 3000

LUT

D

Q

D

Q

Xilinx XC 4000

LUT

LUT

D

Q

Virtex

Slice

LUT

LUT

Propagation

+ Logic

D

Q

D

Q

Virtex 7

Slice

LUT

LUT

LUT

LUT

Propagation + Logic

D

Q

D

Q

D

Q

D

Q

Typical Routing Overview

Routing Elements in Xilinx XC 4000 E

Source: Xilinx data sheet

Source: Xilinx data sheet
Routing Elements in a Xilinx Spartan II E

Source: Xilinx data sheet

Routing Elements in Xilinx Virtex II

Source: Xilinx data sheet

Routing Elements in Xilinx Virtex 4 & Virtex 5

Source: Xilinx data sheet

Cross Section of a Virtex 5 FPGA 65 nm

12 metal layers (11 copper, 1 aluminium), 300 mm wafers, 1 V core supply

Source: http://www.eetimes.com/showArticle.jhtml?articleID=197003451
I/O Have to Support Various Interface Standards

For “old” FPGAs:

```
+---+----------------+------------------+
|   |     Input      |     Output       |
+---+----------------+------------------+
|   |     V_in       |     V_out        |
+---+----------------+------------------+
|   |   0 V, 0.6 V   |   0 V, 0.6 V     |
|   |   1 V, 1.6 V   |   1 V, 1.6 V     |
|   |   2 V, 2.6 V   |   2 V, 2.6 V     |
|   |   3 V, 3.6 V   |   3 V, 3.6 V     |
|   |   4 V, 4.6 V   |   4 V, 4.6 V     |
+---+----------------+------------------+
```

Some of the Supported Interface Standards in Virtex 7

- HSTL_J_DCI
- DIFF_HSTL_J_DCI
- SSTL18_J_DCI
- DIFF_SSTL18_J_DCI
- LVDCI_J_DCI

Source: Xilinx data sheet ug471_7Series_Select10.pdf

Some of the Supported Interface Standards in Altera Stratix V

- HSTL_J_DCI
- DIFF_HSTL_J_DCI
- SSTL18_J_DCI
- DIFF_SSTL18_J_DCI
- LVDCI_J_DCI

Source: Altera data sheet atx5.51006 p. 5.15
Packages for Xilinx Virtex II

<table>
<thead>
<tr>
<th>package</th>
<th>CS144</th>
<th>FG256</th>
<th>FG456</th>
<th>FG676</th>
<th>BG575</th>
<th>BG728</th>
<th>FF996</th>
<th>FF1152</th>
<th>FF1517</th>
<th>BF957</th>
</tr>
</thead>
<tbody>
<tr>
<td>pitch (mm)</td>
<td>0.80</td>
<td>1.00</td>
<td>1.00</td>
<td>1.00</td>
<td>1.27</td>
<td>1.27</td>
<td>1.00</td>
<td>1.00</td>
<td>1.00</td>
<td>1.27</td>
</tr>
<tr>
<td>size (mm)</td>
<td>12x12</td>
<td>17x17</td>
<td>23x23</td>
<td>27x27</td>
<td>31x31</td>
<td>31x31</td>
<td>31x31</td>
<td>35x35</td>
<td>35x35</td>
<td>40x40</td>
</tr>
<tr>
<td>nb. I/O</td>
<td>92</td>
<td>172</td>
<td>324</td>
<td>484</td>
<td>408</td>
<td>516</td>
<td>624</td>
<td>824</td>
<td>1,108</td>
<td>684</td>
</tr>
</tbody>
</table>

Some Packages used in FPGAs

FF1517 Flip-Chip Fine-Pitch BGA

Transceivers in Xilinx Virtex 6 & Virtex 7

Data rates:

<table>
<thead>
<tr>
<th>type</th>
<th>40 nm</th>
<th>28 nm</th>
</tr>
</thead>
<tbody>
<tr>
<td>type</td>
<td>Spartan 6</td>
<td>Virtex 6</td>
</tr>
<tr>
<td>GTP</td>
<td>3.125</td>
<td>6.6</td>
</tr>
<tr>
<td>GTX</td>
<td>6.6</td>
<td>12.5</td>
</tr>
<tr>
<td>GTH</td>
<td>11.18</td>
<td>13.1</td>
</tr>
<tr>
<td>GTZ</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Clock Generation in a Virtex II

DCM: digital clock manager
Possible clock division in a Virtex II: 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5, 5.5, 6, 6.5, 7, 7.5, 8, 9, 10, 11, 12, 13, 14, 15, 16.
Clock Distribution in a Virtex II

Source: Xilinx data sheet

Clocking Resources in a Virtex 6 (1/2)

MMCM: Mixed-Mode Clock Manager

\[ F_{\text{out}} = F_{\text{in}} \times \frac{D_1}{D_0} \]

6 to 18 clock regions depending on the FPGA size

Source: Xilinx data sheet: Virtex 6 FPGA Clocking Resources (UG362) p. 40

Clocking Resources in a Virtex 6 (2/2)

MMCM Application Example:

Source: Xilinx data sheet: Virtex 6 FPGA Clocking Resources (UG362) p. 61

Dedicated Hard Blocks of RAM in a Virtex II

- Single ou dual port 18 Kb BRAM (Block RAM)
- 4 (XC2V40) to 168 (XC2V8000) BRAMs
- Possible configurations for each BRAM:
  16K x 1, 8K x 2, 4K x 4, 1K x 18, 2K x 9, 512 x 36

Source: Xilinx Virtex II data sheet (DS083) pp. 44–46
BRAMs in a Virtex 7

- 36 Kb BRAMs
- 135 (XC7A100T) to 1880 (XC7VX1140T) BRAMs
- RAM or FIFO configurations
- Cascade of 2 blocks
- ECC configurable support

Source: Xilinx Virtex 7 memory resource data sheet (UG473)

Distributed RAM in a Virtex 7

SLICEMs distributed RAM configuration in a Virtex 7 CLB

<table>
<thead>
<tr>
<th>#word × wordsize</th>
<th>ports</th>
<th># LUT</th>
</tr>
</thead>
<tbody>
<tr>
<td>conf. R W</td>
<td></td>
<td></td>
</tr>
<tr>
<td>32 × 1 S</td>
<td>@1</td>
<td>@1</td>
</tr>
<tr>
<td>32 × 1 D</td>
<td>@0 @2</td>
<td>@1</td>
</tr>
<tr>
<td>32 × 2 Q</td>
<td>@0 @2 @4 @0</td>
<td>@1</td>
</tr>
<tr>
<td>32 × 6 SDP</td>
<td>@1</td>
<td>@2</td>
</tr>
<tr>
<td>64 × 1 S</td>
<td>@1</td>
<td>@1</td>
</tr>
<tr>
<td>64 × 1 D</td>
<td>@0 @2</td>
<td>@1</td>
</tr>
<tr>
<td>64 × 1 Q</td>
<td>@0 @0 @2 @0</td>
<td>@1</td>
</tr>
<tr>
<td>64 × 3 SDP</td>
<td>@0</td>
<td>@2</td>
</tr>
<tr>
<td>128 × 1 S</td>
<td>@1</td>
<td>@1</td>
</tr>
<tr>
<td>128 × 1 D</td>
<td>@0 @2</td>
<td>@1</td>
</tr>
<tr>
<td>256 × 1 Q</td>
<td>@0 @0 @0 @0</td>
<td>@1</td>
</tr>
</tbody>
</table>

Source: Xilinx Virtex 7 Configurable Logic Block User Guide (UG474)

Yet Another Dedicated Hard Block Type

Acc ← Acc ± X × Y

Multiply-and-accumulate is widely used in many applications:
- FIR and IIR filters
- FFT
- Matrix / vector computations
- Polynomial approximations
- Discrete Cosine Transform (DCT)
- ...

⇒⇒ dedicated blocks

Source: Altera data sheet

DSP Block Configuration in a Altera Stratix V

Variable precision DSP bloc:
- 2 multipliers 18 × 18 bits per bloc
- pre-adder/subtractor before the multipliers
- adder/subtractor after the multipliers
- e.g. (a × b) ± (c × d)
- 64-bit programmable accumulator (split 2 × 32)

<table>
<thead>
<tr>
<th>#op.</th>
<th>op.</th>
<th>#op.</th>
<th>op.</th>
</tr>
</thead>
<tbody>
<tr>
<td>3</td>
<td>x(9) × y(9)</td>
<td>3</td>
<td>x(18) × y(18)</td>
</tr>
<tr>
<td>2</td>
<td>x(16) × y(16)</td>
<td>2</td>
<td>x(27) × y(27)</td>
</tr>
<tr>
<td>1</td>
<td>x(18) × y(18)</td>
<td>1</td>
<td>x(36) × y(36)</td>
</tr>
<tr>
<td></td>
<td>x(27) × y(27)</td>
<td></td>
<td>x(18) × C y(18)</td>
</tr>
<tr>
<td></td>
<td>x(36) × y(18)</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

x(n) denotes a n-bit operand x

Source: Altera data sheet
Xtreme DSP48 Block in a Virtex 4 FPGA

Source: Xilinx XtremeDSP for Virtex-4 FPGAs User Guide (UG073) p. 55

A. Tisserand, CNRS–IRISA–CAIRN. Introduction to FPGA Circuits

77/1

DSP Block in a Virtex 7 FPGA

Source: Xilinx 7 Series DSP48E1 Slice User Guide (UG479) p. 12

A. Tisserand, CNRS–IRISA–CAIRN. Introduction to FPGA Circuits

79/1

Technology Evolution for Altera FPGAs

Source: Altera data sheets

A. Tisserand, CNRS–IRISA–CAIRN. Introduction to FPGA Circuits

80/1
Architecture Evolution for Altera FPGAs

Altera FPGAs Evolution

| Source: Altera data sheets |

A. Tisserand, CNRS–IRISA–CAIRN. Introduction to FPGA Circuits
81/1

Xilinx Spartan II Overview

Xilinx Spartan II FPGAs

| Packages famille Spartan II E : TQ144, PQ208, FT256, FG456, FG676 |

Configuration bitstream size (in bits):

<table>
<thead>
<tr>
<th>XC2S50E</th>
<th>XC2S100E</th>
<th>XC2S150E</th>
<th>XC2S200E</th>
<th>XC2S300E</th>
<th>XC2S400E</th>
<th>XC2S600E</th>
</tr>
</thead>
<tbody>
<tr>
<td>630 048</td>
<td>863 840</td>
<td>1 134 496</td>
<td>1 442 016</td>
<td>1 875 648</td>
<td>2 693 440</td>
<td>3 961 632</td>
</tr>
</tbody>
</table>

A. Tisserand, CNRS–IRISA–CAIRN. Introduction to FPGA Circuits
84/1
Xilinx Virtex II

Largest device: 112 × 108 CLBs, 168 Mult. 18 × 18 bits, 168 BRAMs, 12 DCMs. Total equiv. ≈ 8 Mgates!

Source: Xilinx data sheets

Xilinx Spartan 3

Achronix Speedster FPGA 1.5 GHz (1/2)

Source: Achronix Speedster FPGA Family (DS001) p. 4

Xilinx FPGAs Evolution

<table>
<thead>
<tr>
<th></th>
<th>Virtex</th>
</tr>
</thead>
<tbody>
<tr>
<td>year</td>
<td>—</td>
</tr>
<tr>
<td>techo. [nm]</td>
<td>220</td>
</tr>
<tr>
<td>$V_{DD}$ [V]</td>
<td>2.5</td>
</tr>
<tr>
<td>max. freq. [MHz]</td>
<td>200</td>
</tr>
<tr>
<td>slices</td>
<td>12 k</td>
</tr>
<tr>
<td># DSP</td>
<td>—</td>
</tr>
<tr>
<td>blocks RAM [Mb]</td>
<td>0.13</td>
</tr>
<tr>
<td># transceivers</td>
<td>—</td>
</tr>
<tr>
<td>throughput [Gb·s$^{-1}$]</td>
<td>—</td>
</tr>
<tr>
<td>I/O</td>
<td>512</td>
</tr>
</tbody>
</table>

Source: Xilinx data sheets
Processors in FPGAs

Processors are widely used in electronic systems and FPGAs:

- the main processor at system level (32-bit, MMU, cache, “OS friendly”, . . .)
- small processors (8/16-bit) used for local control (for coprocessors/accelerators), “smart FSMs”
- FPGAs are larger and larger but implementing embedded processors is still costly (area and design time)
- Embedded and low-power systems $\Rightarrow$ single chip solution

Two solutions for embedded processors in FPGAs:

- dedicated hard blocks processor core(s)
- soft-core processors (synthesized on the FPGA resources)
### Hard Processors Evolution in Xilinx FPGAs

<table>
<thead>
<tr>
<th>Processor</th>
<th>Virtex</th>
<th>Zynq</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>II Pro</td>
<td>4</td>
</tr>
<tr>
<td><strong>PowerPC 405</strong></td>
<td>1 or 2</td>
<td>1 double</td>
</tr>
<tr>
<td><strong>max. freq. [MHz]</strong></td>
<td>300</td>
<td>550</td>
</tr>
<tr>
<td><strong>#pipe. stages</strong></td>
<td>5</td>
<td>7</td>
</tr>
<tr>
<td><strong>L1 [Ko]</strong></td>
<td>16I+16D</td>
<td>32I+32D</td>
</tr>
<tr>
<td><strong>L2 [Ko]</strong></td>
<td>—</td>
<td>—</td>
</tr>
<tr>
<td><strong>FPU</strong></td>
<td>no</td>
<td>no</td>
</tr>
</tbody>
</table>

**SOURCE:** Xilinx data sheets

### Soft-Core Processors

**Definition:** embedded processor implemented using typical FPGA resources (logic blocks, RAM blocks, DSP blocks, …)

**Examples:**

<table>
<thead>
<tr>
<th>Processor</th>
<th>archi.</th>
<th>#bit</th>
<th>pipe. stages</th>
<th>CPI</th>
<th>MMU</th>
<th>FPU</th>
<th>license</th>
</tr>
</thead>
<tbody>
<tr>
<td>NIOS II f</td>
<td>NIOS II</td>
<td>32</td>
<td>6</td>
<td>1</td>
<td>yes</td>
<td>opt.</td>
<td>prop. Altera</td>
</tr>
<tr>
<td>NIOS II s</td>
<td>NIOS II</td>
<td>32</td>
<td>5</td>
<td>1</td>
<td>no</td>
<td>no</td>
<td>prop. Altera</td>
</tr>
<tr>
<td>NIOS II e</td>
<td>NIOS II</td>
<td>32</td>
<td>no</td>
<td>6</td>
<td>no</td>
<td>no</td>
<td>prop. Altera</td>
</tr>
<tr>
<td>MicroBlaze</td>
<td>MicroBlaze</td>
<td>32</td>
<td>3 / 5</td>
<td>1</td>
<td>opt.</td>
<td>opt.</td>
<td>prop. Xilinx</td>
</tr>
<tr>
<td>PicoBlaze</td>
<td>PicoBlaze</td>
<td>8</td>
<td>no</td>
<td>2</td>
<td>no</td>
<td>no</td>
<td>prop. Xilinx</td>
</tr>
<tr>
<td>Cortex M1</td>
<td>ARM V6</td>
<td>32</td>
<td>3</td>
<td>1</td>
<td>yes</td>
<td>yes</td>
<td>prop. ARM</td>
</tr>
<tr>
<td>LEON 2</td>
<td>SPARC V8</td>
<td>32</td>
<td>5</td>
<td>1</td>
<td>yes</td>
<td>yes</td>
<td>LPGL</td>
</tr>
<tr>
<td>LEON 3</td>
<td>SPARC V8</td>
<td>32</td>
<td>7</td>
<td>1</td>
<td>yes</td>
<td>yes</td>
<td>GPL</td>
</tr>
<tr>
<td>OpenRISC 1200</td>
<td>OpenRISC 1200</td>
<td>32</td>
<td>5</td>
<td>1</td>
<td>yes</td>
<td>yes</td>
<td>LGPL</td>
</tr>
</tbody>
</table>

**SOURCE:** Xilinx data sheets

---

**PowerPC 405 in a Xilinx Virtex II (2/2)**

- **Source:** xilinx data sheet (DS083) p. 63

**Xilinx Zynq 7000**

- hard block: dual-core ARM Cortex A9
- 28 nm implementation
- 667, 733, 800 MHz and 1 GHz
- 1GB address space
- 64-bit operations
- Cache L1 I 32 KB 4-way set-associative
- Cache L1 D 32 KB 4-way set-associative
- Cache L2 I+D 512 KB 8-way set-associative
- On-chip boot ROM
- 256 KB on-chip RAM

**SOURCE:** xilinx data sheet (DS190)

---

**A. Tisserand, CNRS-IRISA-CAIRN. Introduction to FPGA Circuits**
Xilinx MicroBlaze (1/2)

Supported FPGAs: Spartan 3/6, Virtex-4/5/6/7, Artix-7, Kintex-7, Zynq-7000

Various supported and optional features:
- Execution Hardware Acceleration
- Instruction Set Extensions
- Cache size configurable: 2kB to 64kB (BRAM)
- Microcache size configurable: 64B to 1024B (DRAM)
- Direct mapped write-through or write-back operation
- Branch optimizations and prediction logic
- Error Correction Codes (ECC)
- Parity protection on internal BRAMs and caches
- 32-bit Floating Point Unit (FPU) IEEE 754
- Memory Management Unit (MMU)
- MPU mode for region protection for secure RTOS applications
- JTAG control via a debug support core

Source: Xilinx LogiCORE IP MicroBlaze (DS865)

Part IV

References

Journals

- ACM Transactions on Reconfigurable Technology and Systems (TRETS)
- IEEE Transactions on Circuits and Systems (TCAS)
- IEEE Transactions on Computers (TC)
- IEEE Transactions on VLSI Systems (TVLSI)
- ...
Conferences

Domain specific conferences:
- FPGA: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
- FCCM: IEEE International Symposium on Field-Programmable Custom Computing Machines
- FPL: International Conference on Field Programmable Logic and Applications
- FPT: International Conference on Field-Programmable Technology

Sessions in general conferences:
- DAC: Design Automation Conference
- DATE: Design, Automation, and Test in Europe conference
- CHES: Workshop on Cryptographic Hardware and Embedded Systems

Books on FPGAs (1/2)

FPGA Design
Best Practices for Team-Based Design
Philip Simpson
2010
Springer
ISBN: 978–1–4419–6339–0

Books on FPGAs (2/2)

FPGA Design Automation
A Survey
Deming Chen, Jason Cong and Peichen Pan
2006
Now Publishers Inc
ISBN: 978-1933019383

FPGA Architecture
Survey and Challenges
Russell Tessier, Jonathan Rose and Ian Kuon
2008
Now Publishers Inc
ISBN: 978–1601981264
Good Books: Circuit Technology & Design

**CMOS VLSI Design**
*A Circuits and Systems Perspective*
Neil Weste and David Harris
3rd edition, 2004
Addison Wesley

---

**Other Topics on FPGAs**

- Configuration
- Partial dynamic reconfiguration
- Low-power aspects
- Security aspects
- Programming
- CAD tools
- FPGA to ASIC conversion solutions
- ...

---

The end, questions?

Contact:

- mailto:arnaud.tisserand@irisa.fr
- http://people.irisa.fr/Arnaud.Tisserand/
- CAIRN Group http://www.irisa.fr/cairn/
- IRISA Laboratory, CNRS–INRIA–Univ. Rennes 1
  6 rue Keramont, CS 80518, F-22305 Lannion cedex, France

Thank you