SAGE - Static Adress Generation Easing
For high throughput applications, turbo-like iterative
decoders are implemented with parallel architectures. However,
to be efficient parallel architectures require to avoid collision
accesses i.e. concurrent read/write accesses should not target the
same memory block. This consideration applies to the two main
classes of turbo-like codes which are Low Density Parity Check
(LDPC) and Turbo-Codes. In these research we propose
methodologies which finds a collision-free mapping of the
variables in the memory banks and which optimizes the
resulting interleaving architecture.
Three main class of approach are explored:
SAGE v0.2 : dld
Goal : In this approach we present a new memory mapping approach dedicated
to any block based and parallel memory systems. It is able to generate a
conflict free memory mapping, optimizing the resulting
interconnection architecture (by targeting user-defined steering
components) even if the interleaving rules or the communication
schemes does not intrinsically allow it.
Reference paper: Paper under review
SAGE v0.1 : dld
Goal : In this paper we propose a methodology which finds a collision-free mapping of the variables in the memory banks and which optimizes the resulting interleaving architecture by targeting a user defined architecture, if the interleaving enables it. This approach is dedicated to turbo-like codes (each processing, in/out order, are performed separetly for each data block)
Last SW releases :
Reference paper: ICASSP
STAR - Space-Time AdapteR
Digital Signal Processing (DSP) applications are know widely used from automotive to wireless communications.
The ever growing design complexity, and the performance requirements, and constraints, on design costs and
power consumption still require significant parts of a design to be implemented using a set of dedicated
hardware accelerators. A classical complex DSP application architecture uses several complex processing
elements, a lot of memories, data mixing modules (interleaver for TurboCodes, Spatial redundancy blocks
for OFDM/MIMO systems...), and is based on a point to point communication network for inter processing
element communications. Such a system may also require to include several applications in a single
architecture ((re)configurable systems). Today, their cost in terms of memory elements is very expensive;
that's why the designers try to reduce the size of the embedded buffers in order to reduce the overall
design area and consumption, and to enhance design performances. In our work, we focus on the optimisation
of component communication interfaces. This problem can be seen as the synthesis (1) of interfaces for IP
cores integration, (2) of data mixing blocks (such as interleavers) with multi-modes architectures, and (3)
of (re)configurable datapath synthesis in high level synthesis flows.
We propose a design methodology to automatically generate and optimize a communication adapter named
Space-Time AdapteR (STAR). Our design flow inputs (1) a timing diagram (constraint file) or (2) a
C description of I/O data scheduling (an interleaving formula), and user requirements (throughput, latency...),
or (3) a set of scheduled and bound CDFGs, and formalizes communication constraints through a formal
Multi-Modes Resource Constraints Graph (MMRCG). The MMRCG properties enable efficient architecture space
exploration to generate a Register Transfert Level (RTL) STAR component.
The STAR architecture is composed of a datapath (using FIFOs, LIFOs and/or registers) and the associated
control state machines. Spatial adaptation (a data can be send from any input port to any/several output
ports) is performed by an interconnection logic. Timing adaptation (data reordering) is realized by the
storage elements. The STAR component uses a LIS interface (Latency Insensitive System) that enables to
implement a gated clock mechanism. The proposed design flow can generate multi-modes architectures.
The design flow is based on the following tools:
- StarTor inputs a C level algorithmic description which specifies the interleaving scheme, and user
requirements (latency, throughput, communication interface, I/O parallelism...). It extracts I/O data
order by generating a trace from the C functional description. Next, it generates the constraints file.
This tool is used to generate the constraints from a C description.
- StarDFG inputs a set of CFDGs generated by a High Level Synthesis tool. These CDFGs are supposed to
be scheduled and bound. This tool extracts data communication order. Then, it generates the constraints
file. This tool is used to generate the constraints from a CDFG.
- STARGene, based on a five-step flow, generates the STAR architecture: (1) Muli-Modes Resource
Compatibility Graph construction from constraint file (generated by StarTor or StarDFG)), (2) Modes
merging step, (3) Storage resource binding on the MMRCG, (4) Architecture optimization and (5) VHDL
RTL generation.
- StarBench generates a test bench based on constraints in order to validate the design by comparison
of simulation results.
In a first experience [GLSVLSI], our design flow has been used to generate an industrial
Ultra Wide Band interleaver example. This is an industrial test case and these experiments have been
performed in collaboration with STMicroelectronics. Using our flow, we show that we can save memory
resources and decrease the latency in any case, compared to classical approach based on memory.
Moreover the number of structure to be controlled is smaller, with our model, than in the reference
design from STMicroelectronics. Currently, the total area of the generated design is about 14% smaller
than the reference design from STMicroelectronics (generated with a widespread commercial HLS tool).
In a second experiment [ICCAD], we use de STAR design flow in a HLS flow in order to generate a reconfigurable
(muli-modes) datapath. These experiments have been performed to generate multi-throughputs
(FFT 64 to 8, FIR 64 to 8...) and multi-configurations (FFT and IFFT, DCT and FIR...) architectures.
These experiments show the efficiency of the combination of (1) our approach and (2) the multi-modes
scheduling and binding algorithms developed in the HLS tool GAUT developed at the UBS University /
LESTER Lab, for the generation and the optimization of the memorising part and the steering logic of
a datapath. We reduce the total area up to 75% compared to a cumulative architecture, and up to 40%
compared to the systems generated by a dedicated multi-modes design flow (SPACT_MR).
In most digital signal processing (DSP) applications, the overall architecture of the system is
significantly affected by communication architecture, so the designers need specifically optimized
adapters. By explicitly modeling these communications within an effective graph-theoretic model and
analysis framework, we automatically generate an optimized architecture, named Space-Time AdapteR
(STAR). Our design flow inputs a C description of Input/Output data scheduling, and user requirements
(throughput, latency, parallelism...), and formalizes communication constraints through a Resource Constraints
Graph (RCG). The RCG properties enable an efficient architecture space exploration in order to synthesize a
STAR component. The proposed approach has been tested to design an industrial data mixing block example: an
Ultra-Wideband interleaver.
Three main release of the STAR software are available:
STAR v0.7 :
Goal : Pipelined version of the STAR System
Last SW releases : dld
Reference paper:Trans. CAD
STAR v0.6 :
Goal : This approach presents a solution to efficiently explore the design space of Multi-Mode (or Multi-Configuration) communication adapters. Given a unified description of a set of time-wise mutually exclusive tasks and their associated throughput constraints, a single register transfer level hardware architecture optimized in area is generated. In order to reduce the register, the steering logic, and the controller complexities, the approach proposes a joint-scheduling algorithm, which maximizes the similarities between the control steps and specific binding approaches for both operators and storage elements which maximize the similarities between the datapaths (see the reference paper). Our design flow inputs a C description of Input/Output data scheduling, and user requirements (throughput, latency, parallelism...), and formalizes communication constraints through a Multi-Mode Resource Constraints Graph (MMRCG). The MMRCG properties enable an efficient architecture space exploration in order to synthesize a Multi-Configuration component.
Last SW releases : dld This version is included in GAUT
Reference paper: ICCAD
STAR v0.5 :
Goal : This approach presents a solution to efficiently explore the design space of communication adapters. In most digital signal processing (DSP) applications, the overall architecture of the system is significantly affected by communication architecture, so the designers need specifically optimized adapters. By explicitly modeling these communications within an effective graph-theoretic model and analysis framework, we automatically generate an optimized architecture, named Space-Time AdapteR (STAR). Our design flow inputs a C description of Input/Output data scheduling, and user requirements (throughput, latency, parallelism...), and formalizes communication constraints through a Resource Constraints Graph (RCG). The RCG properties enable an efficient architecture space exploration in order to synthesize a STAR component.
Last SW releases : dld
GUI SW releases : dld
Reference paper: GLSVLSI

GAUT is an academic High-Level Synthesis tool dedicated to Digital Signal Processing DSP applications.
Starting from a pure C function GAUT extracts the potential parallelism before selecting/allocating operators,
scheduling and binding operations.
The mandatory design constraints are (1) the throughput (the initiation interval), (2) the clock period and
(3) the target technology. The optional design constraints are I/O timing diagram and the memory mapping.
GAUT synthesizes a potentially pipelined architecture composed of a processing unit, a memory unit,
a communication and multiplexing unit and a GALS/LIS interface.
GAUT generates an IEEE P1076 compliant RTL level VHDL file. This VHDL file is an input for commercial,
off the shelf, logical synthesis tools like ISE/Foundation from Xilinx and Design Compiler from Synopsys.
GAUT is free downloadable !!!