Introduction
Warning
|
This is an incomplete draft! Also I didn’t create the FST format so there may be errors in this specification. If you find any please let me know. |
The only standard format for storing digital waveforms from HDL simulations is VCD (Value Change Dump). Unfortunately it’s an ancient and hilariously inefficient text based format. It’s really only suitable for tiny projects. As a result most commercial EDA vendors use their own more efficient proprietary formats, for example Synopsys’s VCS simulator uses FSDB (Fast Signal DataBase) and Mentor’s Questa simulator uses WLF (Wave Log Format).
Unfortunately these formats are all proprietary, therefore the GtkWave authors created a series of formats improving on VCD culminating in FST (Fast Signal Trace), created in 2014. It is supported by GtkWave, Verilator and CVC and although it has some quirks it is much much better than VCD.
Unfortunately there is no official specification for FST, so I wrote this one based on reading the GtkWave FST source code.
Format Overview
Below is a hyperlinked diagram of the entire format. It is a block based format with a few metadata blocks and a set of Value Change blocks which store the actual waveform data. Each Value Change block spans a period of time and includes the initial value of all variables, so it can be decoded independently of any other block. It is designed so the waveforms for a few variables can be extracted without having to decode the entire block.
The hierarchy block records the design hierarchy (modules, functions, ports, etc.). The geometry block records the length of each variable in bits. Finally the blackout block records when $dumpoff
and $dumpon
were executed.
The data in the blocks is compressed with a bewildering variety of compression algorithms and custom encoding schemes.
Data Types
The following data types are used in this document.
Type | Size (bytes) | Signed? | Endianness | Notes |
---|---|---|---|---|
|
1 |
No |
Invariant |
|
|
1 |
Yes |
Invariant |
|
|
8 |
No |
Big |
|
|
8 |
Yes |
Big |
|
|
8 |
Yes |
Native |
Floating point |
|
1-10 |
No |
Invariant |
See Varints. |
|
1-10 |
Yes |
Invariant |
See Varints. |
Null terminated strings are also used at several points in the format. The encoding is not specified however it is reasonable to assume it is ASCII. The strings are used for display purposes only so the encoding is not critical.
Endianness
Most types have a defined Endianness (unfortunately Big Endian), however f64
is for some reason written using native Endianness. The header_real_endianness
header field is an f64
with the constant value e (i.e. 2.72…) that can be used to detect the Endianness used by the file writer.
Varints
Varints are encoded using the LEB128 format but up to 64 bits instead of 128. In short, the upper bit of each byte indicates whether it is the last byte in the varint (1=not last byte, 0=last byte). Then the bits of the number are spread over the lower 7 bits of each byte in Little Endian order (the first byte in the varint
contains the 7 lowest bits in the encoded number).
varint
format used by FST.The encoding for signed numbers (svarint
) is the same but a sign bit is always retained. When decoding the decoded number is sign-extended instead of zero-extended to 64 bits to give a 2’s complement signed number.
svarint
format used by FST.Note that the GtkWave code assumes that some varint
s fit inside 32 bits. However it is inconsistent about this. For example max_blackouts
is written as varint64
but read as varint32
. Therefore we do not distinguish varint32
and varint64
in this specification. It is probably best to assume that all values can be up to 64 bits.
Design Heirarchy, Aliases and Variable IDs
The [Hiearchy Block] describes the design hiearchy — essentially all the modules and ports. Each recorded signal is called a variable. Since many of the variables are directly connected to each other in the design — sometimes many times as with clocks and resets, FST allows you to declare some variables to be aliases of others. Every distinct variable has a unique variable ID, numbered from 0 to
based on the order that they appear in the Hierarchy Block.header_num_vars
-1
Blocks
A FST file is composed of a sequences of TLV (Tag, Length, Value) blocks (AKA sections) all with the following header.
Offset | Type | Description |
---|---|---|
0 |
|
Block type (see |
1 |
|
Length of the block in bytes, including this length value but not including the block type byte. |
9 |
- |
The block data. |
BlockType
The block type can be one of the following values:
Name | Value | Description |
---|---|---|
0 |
Header block, found at the start of the file. |
|
1 |
Value Change data. Records the actual waveforms for a block of time. |
|
2 |
Stores the times when |
|
3 |
Stores the length of each variable. |
|
4 |
Hierarchy data (names of modules, wires etc.) |
|
5 |
Newer version of |
|
6 |
Hierarchy data compressed with LZ4 |
|
7 |
Hierarchy data compressed with LZ4 twice. This gives better compression. |
|
8 |
Even newer version of |
|
254 |
This block is an entire FST file that has been GZipped. |
|
255 |
Value Change blocks are set to this type while being written. |
File Structure
The order of blocks in an FST file is as follows. The Header Block is followed by any number of Value Change blocks. When the file is finalised a Geometry Block, an optional Blackout Block (omitted if there are no blackouts), and an optional Hierarchy Block are appended.
Block Type | Count |
---|---|
1 |
|
1 |
|
0/1 |
|
0/1 |
When a tool is writing out an FST file, it actually does it to two separate files - the main file foo.fst
, and an auxiliary file foo.fst.hier
. When the .fst
file is finalised the .hier
is optionally appended to it and then deleted. It is also possible to just leave the .hier
file as a separate file.
Additionally the entire FST file can be repacked using GZip when finalised so it appears as a single GZip Block. I am not sure why this feature exists. I recommend not supporting this. If you want this functionality support opening .fst.gz
files directly instead.
Header Block
An FST file always starts with a header block. There is no magic number before it. The header block has the following structure.
Name | Offset | Type | Description |
---|---|---|---|
0 |
|
Block type ( |
|
1 |
|
Block length (329). |
|
9 |
|
Start time of the file. Units are given by |
|
17 |
|
End time of the file. |
|
25 |
|
The value e (2.7182818284590452354). This is used as an endianness test for reals. See Endianness. This number can also be used as a magic number to check if a file is an FST file. |
|
33 |
|
Memory used when writing this file in bytes. For informational purposes. |
|
41 |
|
Number of scopes ( |
|
49 |
|
Number of variables in the hierarchy. |
|
57 |
|
Number of variables that are distinct - that is, not structurally equivalent. The same variable (e.g. a clock) may appear many times in the hierarchy but its values are only stored once. |
|
65 |
|
Number of Value Change blocks in the file. |
|
73 |
|
Order of magnitude of the time unit. 0=1s, -9=1ns, etc. |
|
74 |
|
Simulator identifier. Should be null terminated if shorter than 128 bytes. If 128 bytes it does not need to be null terminated. |
|
202 |
|
Null terminated date string as returned by |
|
228 |
- |
Reserved for future use. Should be filled with zeros when written. |
|
321 |
|
File type (see |
|
322 |
|
Timezero ( |
Hierarchy Block
This records the design hierachy and all the signal names.
Name | Offset | Type | Description |
---|---|---|---|
0 |
|
Block type ( |
|
1 |
|
Block length. |
|
9 |
|
Uncompressed length of |
|
17 |
|
Only present for |
|
17/25 |
- |
Compressed hierarchy data. |
The hierarchy_data
compression method is given by hierarchy_type
as follows:
Block Type | Compression |
---|---|
GZip |
|
LZ4 |
|
LZ4 applied twice. The GtkWave code uses this if the hierarchy data is more than 4 MB. |
The field hiearchy_compressed_once_length
is only present if the block type is FST_BL_HIER_LZ4DUO
. It records the length of the data after one application of LZ4.
Note that unlike elsewhere, the compression is unconditional. You shouldn’t check whether the uncompressed length is the same as the compressed length.
After decompression the hierarchy_data
is a list of tagged values. The tags are u8
with the following values:
Each tag is followed by some variable length data. It doesn’t include an explicit length field like TLV so you can’t skip entries without parsing them.
FST_ST_GEN_ATTRBEGIN
Begin an attribute for the current scope. This will be followed by an FST_ST_GEN_ATTREND
unless the type is FST_AT_MISC
, which shouldn’t have one.
FST_ST_GEN_ATTREND
No data. This is just used to mark the end of an attribute.
FST_ST_VCD_SCOPE
Enter a new scope (module, function, etc.).
-
u8
: Type (seeScopeType
). -
u8[up to 512]
: Name. This must be null terminated. -
u8[up to 512]
: Component. This must be null terminated.
FST_ST_VCD_UPSCOPE
No data. Just used to mark the end of a scope.
FST_VT_VCD_
*
-
u8
: Direction for ports (seeVarDir
). -
u8[up to 512]
: Name. This must be null terminated. -
varint
: Length of the variable in bits. If this isFST_VT_VCD_PORT
the length interpreted differently. -
varint
: Structural alias to an existing variable ID. If this is an alias it is set to the variable ID plus 1. If it is not an alias it is set to 0 and the variable is assigned an ID one more than the previous non-aliased variable.
For example if the reader encounters the following alias values it will assign the resulting variable IDs:
Alias varint | Assigned variable ID |
---|---|
0 |
0 |
0 |
1 |
0 |
2 |
0 |
3 |
2 |
Alias to variable ID 1 |
1 |
Alias to varibale ID 0 |
0 |
4 |
0 |
5 |
6 |
Alias to variable ID 5 |
0 |
6 |
Structural aliases are used when the same functionally equivalent signal appears in multiple places in the hierarchy (e.g. with clocks and resets). The value changes of these variables are only encoded once. This is different to dynamic aliases which are used when two variables happen to have the same waveform within a block.
Geometry Block
This describes the length of each variable in bits.
Name | Offset | Type | Description |
---|---|---|---|
0 |
|
Block type ( |
|
1 |
|
Block length. |
|
9 |
|
Length of uncompressed data (or equal to the compressed length if not compressed). |
|
17 |
|
Number of length entries in the data. |
|
25 |
- |
Compressed geometry data. Compressed length is |
The geometry data is compressed using ZLib, unless geom_uncompressed_length == geom_length - 24
in which case it is uncompressed.
The data is an array of geom_count
varint
s that record the length of each variable. The length is recorded as 0 for reals and 0xFFFFFFFF for zero length variables. Note that is not the maximum value a varint
can encode. It is just a very large value.
Value Change Block
These blocks store the actual variable data. Each block stores the waveforms of all variables for a contiguous period of time.
Name | Offset | Type | Description |
---|---|---|---|
0 |
|
Block type ( |
|
1 |
|
Block length. |
|
9 |
|
Start time of the block. The units are given by |
|
17 |
|
End time of the block. |
|
25 |
|
Amount of buffer memory required when reading this block for a full Value Change traversal. |
|
33 |
|
Uncompressed length |
|
- |
|
Compressed length (equal to the uncompressed length if no compression). |
|
- |
|
Number of entries in the bits table. |
|
- |
- |
Bits Array data. Compressed with ZLib if the compressed and uncompressed lengths differ. |
|
- |
|
Number of waveforms in the waves table. |
|
- |
|
Compression type used for |
|
- |
- |
Set of deduplicated waveforms for this time period. |
|
- |
- |
Position Table data, encoded as described below. |
|
- |
|
Length of |
|
- |
- |
Time Table data. Compressed with ZLib. |
|
- |
|
Uncompressed length of time table. |
|
- |
|
Compressed length of time table (equal to uncompressed length if there’s no compression). |
|
- |
|
Number of items in the time table. |
It contains four tables - the bits array, waves table, position table and time table. Note that the lengths of the position and time tables come after their data, so you have to read backwards from the end to decode those tables. I am not sure of the reason for this.
Bits Array
The bits array stores the value of all signals at vc_start_time
. vc_bits_data
contains the value of each signal concatenated. The length of each is signal (in bits) is given in the Geometry Block. All values are stored using the ASCII encoding (0
, 1
, X
, Z
, etc.) with one bit per byte. Variable length records are not stored because they have no state. Reals are stored as Native Endian f64
(f32
is never used even if that is the actual datatype in the simulation). It is unclear how reals with X
bits are stored.
The Bits Array is optionally compressed with ZLib (if vc_bits_uncompressed_length
and vc_bits_compressed_length
are unequal).
Waves Table
This table contains the actual value changes for each variable. These are deduplicated so that if two variables happen to have the same value changes for the time period that this block covers, that data will not be stored twice — even if the two variables are not structurally equivalent.
The data consists of vc_waves_count
of the following:
Name | Offset | Type | Description |
---|---|---|---|
0 |
|
Uncompressed length of the waves. 0 means it is not compressed. |
|
- |
- |
Wave data. Compression type is given by |
The data that is stored is a series of (time_index_delta, value) pairs. The time_delta encodes an index into the Time Table (it is the delta from the previous index). The data pair is encoded differently depending on the variable type and length.
If the variable is a 1-bit value (e.g. logic
or bit
in SystemVerilog) then the time_index_delta and value are encoded as a single varint
depending on its value:
Value | Varint Value |
---|---|
0 |
|
1 |
|
X |
|
Z |
|
H |
|
U |
|
W |
|
L |
|
- |
|
? |
|
SystemVerilog uses 0, 1, X and Z. VHDL can use all values. See https://en.wikipedia.org/wiki/IEEE_1164
The lowest bit indicates whether the value is 0/1 or not. 0 and 1 are encoded in a slightly more efficient way than the other values since they are so much more common.
If the variable is not a 1-bit value then the time_index_delta
is encoded as its own varint
together with an encoding mode bit:
time_index_delta << 1 | all_binary
If all_binary
is 1 then this means the value only contains 0’s or 1’s. There are no X’s, Z’s and so on. In this case the values are encoded as raw bits packed into a whole number of bytes.
If all_binary
is 0 then the data that follows is encoded as raw ASCII, e.g. "01Z011XX1".
The rules for FST_VT_VCD_REAL
are slightly different:
-
If
all_binary
is 0 then the bits of thef64
are encoded as ASCII as before (this is unlikely to happen but it is possible). If they’re 1 then it is a native Endianf64
.
Position Table
This contains pointers into the value change data for each variable to allow deduplicating them. There are header_num_vars
entries in the table. The pointers for each variable are decoded from the Position Table data in different ways depending on the Block Type.
FST_BL_VCDATA_DYN_ALIAS2
The Position Table data expands to an array of signed integers. The meaning of theses decoded numbers is as follows:
Decoded position value | Meaning |
---|---|
0 |
The variable has no value changes. |
>0 |
This is a byte offset into |
<0 |
This is a "dynamic alias". The variable’s change data is exactly the same as the variable with this ID code (negated and minus one). |
For example if we have this sequence:
0 0 100 0 -3 0 200 300 -3
It means the following:
Variable ID | Integer Value | Meaning |
---|---|---|
0 |
0 |
This variable doesn’t change in this block. |
1 |
0 |
This variable doesn’t change in this block. |
2 |
100 |
The changes are at byte offset 101 in |
3 |
0 |
This variable doesn’t change in this block. |
4 |
-3 |
In this block this variable has the same changes as variable 2. |
5 |
0 |
This variable doesn’t change in this block. |
6 |
200 |
The changes are at byte offset 201 in |
7 |
350 |
The changes are at byte offset 351 in |
8 |
-3 |
In this block this variable has the same changes as variable 2. |
Those numbers are then encoded as follows.
-
A run of 1 or more 0’s (i.e. any length of 0’s) are encoded as a
varint
equal torun_length << 1
. -
All other values are encoded as an
svarint
equal tovalue << 1 | 1
wherevalue
is: -
If negative: 0 if it matches the previous negative value, otherwise the negative value itself.
-
If positive: The delta from the previous positive value.
So the above values would be encoded as:
Variable ID | Integer Value | Encoding |
---|---|---|
0 |
0 |
Run of two 0’s so |
1 |
0 |
- |
2 |
100 |
|
3 |
0 |
Run of one 0 so |
4 |
-3 |
|
5 |
0 |
Run of one 0 so |
6 |
200 |
Delta from previous is 100 so |
7 |
350 |
Delta from previous is 150 so |
8 |
-3 |
Matches previous dynamic alias (variable 4) so |
FST_BL_VCDATA_DYN_ALIAS
This uses a slightly different encoding to the above scheme.
Time Table
The Time Table data is an array of vc_time_count
varint
s that encode the time differences between simulation times when a value changes. For instance if value changes occur at these times:
10, 50, 100, 101
Then the Time Table data contains these `varint`s:
10, 40, 50, 1
The array is compressed with ZLib if [vc_compressed_length] and [vc_uncompressed_length] are not equal.
Blackout Block
This records the times that $dumpoff
and $dumpon
were called.
Name | Offset | Type | Description |
---|---|---|---|
0 |
|
Block type ( |
|
1 |
|
Block length. |
|
9 |
|
Number of blackout entries. |
Then it is followed by `blackout_count` records with this structure:
Name | Offset | Type | Description |
---|---|---|---|
0 |
|
Blackout activity. 0 = |
|
1 |
|
Time delta from the previous activity. |
GZip Block
The entire FST file can be optionally repacked using GZip on close. In that case the file appears as a single wrapper block of this type. I do not recommend using or supporting this. I cannot see the advantage over just supporting .fst.gz
directly.
Name | Offset | Type | Description |
---|---|---|---|
0 |
|
Block type ( |
|
1 |
|
Block length. |
|
9 |
|
Length of the section in bytes (uncompressed) |
|
17 |
- |
The GZip (not ZLib) compressed FST file. |
Enums
WriterPackType
Indicates the type of compression used for Value Change data.
Name | Value | Description |
---|---|---|
|
|
Compressed with ZLib |
|
|
Compressed with FastLZ |
|
|
Compressed with LZ4 |
The GtkWave reader code assumes ZLib if an unknown value is found.
FileType
This is the type of source that was used to generate the signals. The default is FST_FT_VERILOG
. For informational purposes only; it has no effect on reading the file.
Name | Value |
---|---|
|
0 |
|
1 |
|
2 |
ScopeType
Name | Value |
---|---|
0 |
|
1 |
|
2 |
|
3 |
|
4 |
|
5 |
|
6 |
|
7 |
|
8 |
|
9 |
|
10 |
|
11 |
|
12 |
|
13 |
|
14 |
|
15 |
|
16 |
|
17 |
|
18 |
|
19 |
|
20 |
|
21 |
|
252 |
|
253 |
|
254 |
|
255 |
VarType
Name | Value | Notes |
---|---|---|
|
0 |
|
|
1 |
|
|
2 |
|
|
3 |
|
|
4 |
|
|
5 |
|
|
6 |
|
|
7 |
|
|
8 |
|
|
9 |
|
|
10 |
|
|
11 |
|
|
12 |
|
|
13 |
|
|
14 |
|
|
15 |
|
|
16 |
|
|
17 |
|
|
18 |
|
|
19 |
|
|
20 |
|
|
21 |
|
|
22 |
|
|
23 |
|
|
24 |
32-bit value |
|
25 |
16-bit value |
|
26 |
64-bit value |
|
27 |
8-bit value |
|
28 |
|
|
29 |
VarDir
Name | Value |
---|---|
|
0 |
|
1 |
|
2 |
|
3 |
|
4 |
|
5 |
AttrType
Name | Value | Notes |
---|---|---|
|
0 |
This type does not have a matching |
|
1 |
|
|
2 |
|
|
3 |
MiscType
Name | Value |
---|---|
|
0 |
|
1 |
|
2 |
|
3 |
|
4 |
|
5 |
|
6 |
|
7 |
|
8 |
ArrayType
Name | Value |
---|---|
|
0 |
|
1 |
|
2 |
|
3 |
EnumValueType
Name | Value |
---|---|
|
0 |
|
1 |
|
2 |
|
3 |
|
4 |
|
5 |
|
6 |
|
7 |
|
8 |
|
9 |
|
10 |
|
11 |
|
12 |
|
13 |
|
14 |
|
15 |
PackType
Name | Value |
---|---|
|
0 |
|
1 |
|
2 |
|
3 |
SupplementalVarType
Name | Value |
---|---|
|
0 |
|
1 |
|
2 |
|
3 |
|
4 |
|
5 |
SupplementalDataType
Name | Value |
---|---|
|
0 |
|
1 |
|
2 |
|
3 |
|
4 |
|
5 |
|
6 |
|
7 |
|
8 |
|
9 |
|
10 |
|
11 |
|
12 |
|
13 |
|
14 |
|
15 |
|
16 |
Suggestions for FST2
While working on this specification I found a number of things that are a bit weird and could be improved. Here are some suggestions for FST2 (if it ever exists):
-
header_real_endianness
can be used as a magic number to identify files but it would be better to use a more traditional one at the start of the file, ideally including a major version number. These can be combined, e.g. the file can start withFST2
,FST3
, etc. -
Little Endian should be used everywhere. Modern computers are all Little Endian. The cost of endianness conversion may be small but the cognative overload of having to convert values everywhere is not. Code would be vastly simplified if it just did not need to worry about this.
-
Protobuf’s zigzag encoding for signed varints is much easier to deal with than LEB128’s.
-
There are way too many compression formats supported. It should probably just support one or two - probably LZ4 and maybe ZStd.
-
It may also be worth using prefix varints or grouped varints.
-
Strings should use (length, data) instead of null termination.
-
The Value Change block puts the lengths of all its tables at various weird places between them. They’re all mandatory. Just put their lengths all in one place in the block header.
-
You have to decode the whole Position Table even if you are only interested in a few variables. It would be good to solve that and ideally get rid of the complicated encoding scheme for it.
-
You have to decode the whole Bits Array even if you are only interested in a few variables.
-
The format does not include a way to store delta cycles, or order changes at the same time step. These can be really helpful for debugging.
-
It also cannot record transactions.
-
Different simulators support different value types. E.g. Verilator only outputs 0 and 1, VCS outputs 0, 1, X, Z, etc. It would be helpful if there was a field in the header block that indicated which values would be encountered. This allows readers to use an efficient in-memory representation of the wave data.