Cyrille CHAVET

Université Bretagne Sud

SAGE - Static Adress Generation Easing

For high throughput applications, turbo-like iterative decoders are implemented with parallel architectures. However, to be efficient parallel architectures require to avoid collision accesses i.e. concurrent read/write accesses should not target the same memory block. This consideration applies to the two main classes of turbo-like codes which are Low Density Parity Check (LDPC) and Turbo-Codes. In these research we propose methodologies which finds a collision-free mapping of the variables in the memory banks and which optimizes the resulting interleaving architecture.

Three main class of approach are explored:
SAGE v0.2 : dld
Goal : In this approach we present a new memory mapping approach dedicated to any block based and parallel memory systems. It is able to generate a conflict free memory mapping, optimizing the resulting interconnection architecture (by targeting user-defined steering components) even if the interleaving rules or the communication schemes does not intrinsically allow it.
Reference paper: Paper under review

SAGE v0.1 : dld
Goal : In this paper we propose a methodology which finds a collision-free mapping of the variables in the memory banks and which optimizes the resulting interleaving architecture by targeting a user defined architecture, if the interleaving enables it. This approach is dedicated to turbo-like codes (each processing, in/out order, are performed separetly for each data block)
Last SW releases :
Reference paper: ICASSP

STAR - Space-Time AdapteR

Digital Signal Processing (DSP) applications are know widely used from automotive to wireless communications. The ever growing design complexity, and the performance requirements, and constraints, on design costs and power consumption still require significant parts of a design to be implemented using a set of dedicated hardware accelerators. A classical complex DSP application architecture uses several complex processing elements, a lot of memories, data mixing modules (interleaver for TurboCodes, Spatial redundancy blocks for OFDM/MIMO systems...), and is based on a point to point communication network for inter processing element communications. Such a system may also require to include several applications in a single architecture ((re)configurable systems). Today, their cost in terms of memory elements is very expensive; that's why the designers try to reduce the size of the embedded buffers in order to reduce the overall design area and consumption, and to enhance design performances. In our work, we focus on the optimisation of component communication interfaces. This problem can be seen as the synthesis (1) of interfaces for IP cores integration, (2) of data mixing blocks (such as interleavers) with multi-modes architectures, and (3) of (re)configurable datapath synthesis in high level synthesis flows.
We propose a design methodology to automatically generate and optimize a communication adapter named Space-Time AdapteR (STAR). Our design flow inputs (1) a timing diagram (constraint file) or (2) a C description of I/O data scheduling (an interleaving formula), and user requirements (throughput, latency...), or (3) a set of scheduled and bound CDFGs, and formalizes communication constraints through a formal Multi-Modes Resource Constraints Graph (MMRCG). The MMRCG properties enable efficient architecture space exploration to generate a Register Transfert Level (RTL) STAR component.

The STAR architecture is composed of a datapath (using FIFOs, LIFOs and/or registers) and the associated control state machines. Spatial adaptation (a data can be send from any input port to any/several output ports) is performed by an interconnection logic. Timing adaptation (data reordering) is realized by the storage elements. The STAR component uses a LIS interface (Latency Insensitive System) that enables to implement a gated clock mechanism. The proposed design flow can generate multi-modes architectures.
The design flow is based on the following tools:
- StarTor inputs a C level algorithmic description which specifies the interleaving scheme, and user requirements (latency, throughput, communication interface, I/O parallelism...). It extracts I/O data order by generating a trace from the C functional description. Next, it generates the constraints file. This tool is used to generate the constraints from a C description.
- StarDFG inputs a set of CFDGs generated by a High Level Synthesis tool. These CDFGs are supposed to be scheduled and bound. This tool extracts data communication order. Then, it generates the constraints file. This tool is used to generate the constraints from a CDFG.
- STARGene, based on a five-step flow, generates the STAR architecture: (1) Muli-Modes Resource Compatibility Graph construction from constraint file (generated by StarTor or StarDFG)), (2) Modes merging step, (3) Storage resource binding on the MMRCG, (4) Architecture optimization and (5) VHDL RTL generation.
- StarBench generates a test bench based on constraints in order to validate the design by comparison of simulation results.

In a first experience [GLSVLSI], our design flow has been used to generate an industrial Ultra Wide Band interleaver example. This is an industrial test case and these experiments have been performed in collaboration with STMicroelectronics. Using our flow, we show that we can save memory resources and decrease the latency in any case, compared to classical approach based on memory. Moreover the number of structure to be controlled is smaller, with our model, than in the reference design from STMicroelectronics. Currently, the total area of the generated design is about 14% smaller than the reference design from STMicroelectronics (generated with a widespread commercial HLS tool).
In a second experiment [ICCAD], we use de STAR design flow in a HLS flow in order to generate a reconfigurable (muli-modes) datapath. These experiments have been performed to generate multi-throughputs (FFT 64 to 8, FIR 64 to 8...) and multi-configurations (FFT and IFFT, DCT and FIR...) architectures. These experiments show the efficiency of the combination of (1) our approach and (2) the multi-modes scheduling and binding algorithms developed in the HLS tool GAUT developed at the UBS University / LESTER Lab, for the generation and the optimization of the memorising part and the steering logic of a datapath. We reduce the total area up to 75% compared to a cumulative architecture, and up to 40% compared to the systems generated by a dedicated multi-modes design flow (SPACT_MR).
In most digital signal processing (DSP) applications, the overall architecture of the system is significantly affected by communication architecture, so the designers need specifically optimized adapters. By explicitly modeling these communications within an effective graph-theoretic model and analysis framework, we automatically generate an optimized architecture, named Space-Time AdapteR (STAR). Our design flow inputs a C description of Input/Output data scheduling, and user requirements (throughput, latency, parallelism...), and formalizes communication constraints through a Resource Constraints Graph (RCG). The RCG properties enable an efficient architecture space exploration in order to synthesize a STAR component. The proposed approach has been tested to design an industrial data mixing block example: an Ultra-Wideband interleaver.

Three main release of the STAR software are available:
STAR v0.7 :
Goal : Pipelined version of the STAR System
Last SW releases : dld
Reference paper:Trans. CAD


STAR v0.6 :
Goal : This approach presents a solution to efficiently explore the design space of Multi-Mode (or Multi-Configuration) communication adapters. Given a unified description of a set of time-wise mutually exclusive tasks and their associated throughput constraints, a single register transfer level hardware architecture optimized in area is generated. In order to reduce the register, the steering logic, and the controller complexities, the approach proposes a joint-scheduling algorithm, which maximizes the similarities between the control steps and specific binding approaches for both operators and storage elements which maximize the similarities between the datapaths (see the reference paper). Our design flow inputs a C description of Input/Output data scheduling, and user requirements (throughput, latency, parallelism...), and formalizes communication constraints through a Multi-Mode Resource Constraints Graph (MMRCG). The MMRCG properties enable an efficient architecture space exploration in order to synthesize a Multi-Configuration component.
Last SW releases : dld This version is included in GAUT
Reference paper: ICCAD


STAR v0.5 :
Goal : This approach presents a solution to efficiently explore the design space of communication adapters. In most digital signal processing (DSP) applications, the overall architecture of the system is significantly affected by communication architecture, so the designers need specifically optimized adapters. By explicitly modeling these communications within an effective graph-theoretic model and analysis framework, we automatically generate an optimized architecture, named Space-Time AdapteR (STAR). Our design flow inputs a C description of Input/Output data scheduling, and user requirements (throughput, latency, parallelism...), and formalizes communication constraints through a Resource Constraints Graph (RCG). The RCG properties enable an efficient architecture space exploration in order to synthesize a STAR component.
Last SW releases : dld
GUI SW releases : dld
Reference paper: GLSVLSI


GAUT

GAUT is an academic High-Level Synthesis tool dedicated to Digital Signal Processing DSP applications.
Starting from a pure C function GAUT extracts the potential parallelism before selecting/allocating operators, scheduling and binding operations.
The mandatory design constraints are (1) the throughput (the initiation interval), (2) the clock period and (3) the target technology. The optional design constraints are I/O timing diagram and the memory mapping.
GAUT synthesizes a potentially pipelined architecture composed of a processing unit, a memory unit, a communication and multiplexing unit and a GALS/LIS interface.
GAUT generates an IEEE P1076 compliant RTL level VHDL file. This VHDL file is an input for commercial, off the shelf, logical synthesis tools like ISE/Foundation from Xilinx and Design Compiler from Synopsys.

GAUT is free downloadable !!!