Expand description
Low-level functions for BLK file format, for high level API use the vromf
module
§Binary BLK
This module describes the binary BLK format and all of its know features and flags.
⚠ Warning: Unless stated otherwise, all integers are Little-endian
§Terminology
For ease of writing and reading, there are a few terms i will establish for this chapter
- u32 -> 32-bit unsigned integer
- u64 -> 64-bit unsigned integer
- f32 -> 32 bit float
- f64 -> 64 bit float
- [T; N] -> N long array of T
- offset -> absolute offset from 0 in the file
- nm -> A file containing Strings that is not in the BLK binary itself
- dict -> A ZSTD dictionary used in batch compressing many BLK files
- ULEB -> Unsigned LEB encoded variable length integer format:
https://en.wikipedia.org/wiki/LEB128
- Name/Name map -> String as key or value in the BLK, map is an array of Strings
§Types
BLK has 12 data-types
Name | Byte identifier | Layout | Size in bytes | Inline |
---|---|---|---|---|
String | 0x01 | Zero terminated string | Variable | |
Int | 0x02 | i32 | 4 | yes |
Float | 0x03 | f32 | 4 | yes |
Float2 | 0x04 | [f32; 2] | 8 | |
Float3 | 0x05 | [f32; 3] | 12 | |
Float4 | 0x06 | [f32; 4] | 16 | |
Int2 | 0x07 | [i32; 2] | 8 | |
Int3 | 0x08 | [i32; 3] | 12 | |
Bool | 0x09 | boolean | 4 | yes |
Color | 0x0a | [u8; 4] | 4 | yes |
Float12 | 0x0b | [f32; 12] | 48 | |
Long | 0x0c | i64 | 8 |
§Inlining
When a type is inline, it means that its offset field contains the types data (used for small types 32 bits or smaller). Otherwise, the type data is an offset where the actual payload can be found.
§Kinds of BLK files
There is not just one type of BLK, there are some important differences to denote.
Byte ID | String ID | Description |
---|---|---|
0x00 | BBF | A legacy format that this library does not understand |
0x01 | FAT | A standalone BLK binary, about as normal as it gets |
0x02 | FAT_ZST | Same as FAT, but ZSTD compressed |
0x03 | SLIM | A BLK like FAT, but with all strings outlined to the nm |
0x04 | SLIM_ZST | Same as SLIM, but ZSTD compressed |
0x05 | SLIM_ZSTD_DICT | Same as SLIM, but ZSTD compressed using the dict |
§Name reference
Whenever a String is used in the name-map or field-map, this layout applies. If the tag-bit is set, it means the index has to be looked up from the external nm, otherwise the regular nm,
Tag | Index |
---|---|
1 bit | 31 bit unsigned integer |
§File layout
Now that we understand all necessary terms and types for BLK, were almost ready to encode/decode them.
I will first explain the layout of a regular FAT file, then elaborate on the differences to the other kinds.
The file starts off with a single byte describing its kind, FAT in this case, of course.
After that, we begin the first section of data already called the [Name map].
§Name Map
Defines where any Strings used as keys or values in this BLK. In the case of SLIM, only the names count ULEB will be present, skipping straight to the struct count. In the case of FAT, following the names count (ULEB) N is the names buffer size (ULEB) S. After this comes all null seperated strings that should be as many as N specified.
§Block count
A single ULEB defining how many nested structs the BLK file contains (used later).
§Field map
Similar to the name map, here we first get the amount of fields followed by the size of the payload buffer. The field payloads come before the field definitions and therefore need to be used during the decoding of the fields. Each field definition is 32 bits in size structured as such:
Name ID | Type ID | Offset |
---|---|---|
u24 | u8 | u32 |
When the type is inline, simply interpret the offset as its payload. Otherwise, use the offset relative to the start of the payload buffer, reading as many bytes as the type needs.
§Nesting map
§Until now, we have only read and parsed a list of fields. But you may ask, doesn’t BLK support nested structs? Indeed, it does, and it uses the following layout to figure out which fields and struct correspond to which parent struct. The data layout is as such: |Index|Struct name|field count|sub-structs count|sub-struct index (optional)| |-|-|-|-| |ULEB|Name ID|ULEB|ULEB|
Explaining the algorithm necessary to structure this data is not trivial, so I will explain each value and its purpose with care.
However, I believe that, reading crate::blk::blk_block_hierarchy::FlatBlock
(implementation) together with this text will work better than just the text.
To start off, we first get the index, which uniquely identifies any struct, where 0 is the root/core struct. Together with the index, we can determine the name, which is a Name ID or undefined and irrelevant if the index is 0.
Field count
defines how many fields from the field map belong to this struct in the order as they appear.
Keeping track of the sum of previouscount
’s is important as current count
starts from where the last field ended.
Sub-structs count defines how many substructs there are.
Sub-structs index defines which other struct are contained in this one, using the same indexing system as field count. This value is not present when sub-structs count was 0.
This mechanism is best explained with an example to go with it: Lets use this BLK as our working minimal example:
"vec4f":p4 = 1.25, 2.5, 5, 10
"int":i = 42
"long":i64 = 0x40
"alpha" {
"str":t = "hello"
"bool":b = true
"color":c = 0x1, 0x2, 0x3, 0x4
"gamma" {
"vec2i":ip2 = 3, 4
"vec2f":p2 = 1.25, 2.5
"transform":m = [[1, 0, 0] [0, 1, 0] [0, 0, 1] [1.25, 2.5, 5]]
}
}
"beta" {
"float":r = 1.25
"vec2i":ip2 = 1, 2
"vec3f":p3 = 1.25, 2.5, 5
}
The Nesting map would look like the following:
Index | Name | Indexes | Sub-blocks | Binary representation |
---|---|---|---|---|
0 | N.A. | 0,1,2 | 1,2 | 0x00 0x03 0x02 0x01 |
1 | alpha | 3,4,5 | 3 | 0x04 0x03 0x01 0x03 |
2 | beta | 6,7,8 | 0x0C 0x03 0x00 | |
3 | gamma | 9,10,11 | 0x08 0x03 0x00 |
Modules§
- binary_
deserialize - Implementation for deserializing internal representation to binary form
- blk_
block_ 🔒hierarchy - Decodes flat map of fields into the corresponding nested datastructures
- blk_
string 🔒 - blk_
structure - Defines the recursive/nested structure that BLK files are represented with internally
- blk_
type - Defines the primitive types that BLK stores
- error
- Shared error that is returned from hot functions,
otherwise,
color_eyre::Report
is used - file
- One-byte file header that each blk file begins with
- leb128
- Utility function to decode ULEB128 encoded files https://en.wikipedia.org/wiki/LEB128
- name_
map - Struct storing a shared map of strings that multiple BLK files reference
- plaintext_
deserialize 🔒 - Implementations for deserializing into internal representation format from text
- plaintext_
serialize - Implementations for serializing into human readable text formats from internal representation
- util
- Collection of macros and functions used in all BLK modules
- zstd
- Zstandard unpacking functionality
Structs§
- Decoder
Dictionary - Prepared dictionary for decompression
Functions§
- make_
strict_ test - test_
parse_ 🔒dir - unpack_
blk - Highest-level function for unpacking one BLK explicitly, for direct low level control call
binary_deserialize::parser::parse_blk