Module blk

Source
Expand description

Low-level functions for BLK file format, for high level API use the vromf module

§Binary BLK

This module describes the binary BLK format and all of its know features and flags.

Warning: Unless stated otherwise, all integers are Little-endian

§Terminology

For ease of writing and reading, there are a few terms i will establish for this chapter

  • u32 -> 32-bit unsigned integer
  • u64 -> 64-bit unsigned integer
  • f32 -> 32 bit float
  • f64 -> 64 bit float
  • [T; N] -> N long array of T
  • offset -> absolute offset from 0 in the file
  • nm -> A file containing Strings that is not in the BLK binary itself
  • dict -> A ZSTD dictionary used in batch compressing many BLK files
  • ULEB -> Unsigned LEB encoded variable length integer format: https://en.wikipedia.org/wiki/LEB128
  • Name/Name map -> String as key or value in the BLK, map is an array of Strings

§Types

BLK has 12 data-types

NameByte identifierLayoutSize in bytesInline
String0x01Zero terminated stringVariable
Int0x02i324yes
Float0x03f324yes
Float20x04[f32; 2]8
Float30x05[f32; 3]12
Float40x06[f32; 4]16
Int20x07[i32; 2]8
Int30x08[i32; 3]12
Bool0x09boolean4yes
Color0x0a[u8; 4]4yes
Float120x0b[f32; 12]48
Long0x0ci648

§Inlining

When a type is inline, it means that its offset field contains the types data (used for small types 32 bits or smaller). Otherwise, the type data is an offset where the actual payload can be found.

§Kinds of BLK files

There is not just one type of BLK, there are some important differences to denote.

Byte IDString IDDescription
0x00BBFA legacy format that this library does not understand
0x01FATA standalone BLK binary, about as normal as it gets
0x02FAT_ZSTSame as FAT, but ZSTD compressed
0x03SLIMA BLK like FAT, but with all strings outlined to the nm
0x04SLIM_ZSTSame as SLIM, but ZSTD compressed
0x05SLIM_ZSTD_DICTSame as SLIM, but ZSTD compressed using the dict

§Name reference

Whenever a String is used in the name-map or field-map, this layout applies. If the tag-bit is set, it means the index has to be looked up from the external nm, otherwise the regular nm,

TagIndex
1 bit31 bit unsigned integer

§File layout

Now that we understand all necessary terms and types for BLK, were almost ready to encode/decode them. I will first explain the layout of a regular FAT file, then elaborate on the differences to the other kinds. Illustration The file starts off with a single byte describing its kind, FAT in this case, of course. After that, we begin the first section of data already called the [Name map].

§Name Map

Defines where any Strings used as keys or values in this BLK. In the case of SLIM, only the names count ULEB will be present, skipping straight to the struct count. In the case of FAT, following the names count (ULEB) N is the names buffer size (ULEB) S. After this comes all null seperated strings that should be as many as N specified.

§Block count

A single ULEB defining how many nested structs the BLK file contains (used later).

§Field map

Similar to the name map, here we first get the amount of fields followed by the size of the payload buffer. The field payloads come before the field definitions and therefore need to be used during the decoding of the fields. Each field definition is 32 bits in size structured as such:

Name IDType IDOffset
u24u8u32

When the type is inline, simply interpret the offset as its payload. Otherwise, use the offset relative to the start of the payload buffer, reading as many bytes as the type needs.

§Nesting map

§Until now, we have only read and parsed a list of fields. But you may ask, doesn’t BLK support nested structs? Indeed, it does, and it uses the following layout to figure out which fields and struct correspond to which parent struct. The data layout is as such: |Index|Struct name|field count|sub-structs count|sub-struct index (optional)| |-|-|-|-| |ULEB|Name ID|ULEB|ULEB|

Explaining the algorithm necessary to structure this data is not trivial, so I will explain each value and its purpose with care. However, I believe that, reading crate::blk::blk_block_hierarchy::FlatBlock (implementation) together with this text will work better than just the text.

To start off, we first get the index, which uniquely identifies any struct, where 0 is the root/core struct. Together with the index, we can determine the name, which is a Name ID or undefined and irrelevant if the index is 0.

Field count defines how many fields from the field map belong to this struct in the order as they appear. Keeping track of the sum of previouscount’s is important as current count starts from where the last field ended.

Sub-structs count defines how many substructs there are.

Sub-structs index defines which other struct are contained in this one, using the same indexing system as field count. This value is not present when sub-structs count was 0.

This mechanism is best explained with an example to go with it: Lets use this BLK as our working minimal example:

"vec4f":p4 = 1.25, 2.5, 5, 10
"int":i = 42
"long":i64 = 0x40
"alpha" {
   	"str":t = "hello"
   	"bool":b = true
   	"color":c = 0x1, 0x2, 0x3, 0x4
   	"gamma" {
   		"vec2i":ip2 = 3, 4
   		"vec2f":p2 = 1.25, 2.5
   		"transform":m = [[1, 0, 0] [0, 1, 0] [0, 0, 1] [1.25, 2.5, 5]]
   	}
}
"beta" {
   	"float":r = 1.25
   	"vec2i":ip2 = 1, 2
   	"vec3f":p3 = 1.25, 2.5, 5
}

The Nesting map would look like the following:

IndexNameIndexesSub-blocksBinary representation
0N.A.0,1,21,20x00 0x03 0x02 0x01
1alpha3,4,530x04 0x03 0x01 0x03
2beta6,7,80x0C 0x03 0x00
3gamma9,10,110x08 0x03 0x00

Modules§

binary_deserialize
Implementation for deserializing internal representation to binary form
blk_block_hierarchy 🔒
Decodes flat map of fields into the corresponding nested datastructures
blk_string 🔒
blk_structure
Defines the recursive/nested structure that BLK files are represented with internally
blk_type
Defines the primitive types that BLK stores
error
Shared error that is returned from hot functions, otherwise, color_eyre::Report is used
file
One-byte file header that each blk file begins with
leb128
Utility function to decode ULEB128 encoded files https://en.wikipedia.org/wiki/LEB128
name_map
Struct storing a shared map of strings that multiple BLK files reference
plaintext_deserialize 🔒
Implementations for deserializing into internal representation format from text
plaintext_serialize
Implementations for serializing into human readable text formats from internal representation
util
Collection of macros and functions used in all BLK modules
zstd
Zstandard unpacking functionality

Structs§

DecoderDictionary
Prepared dictionary for decompression

Functions§

make_strict_test
test_parse_dir 🔒
unpack_blk
Highest-level function for unpacking one BLK explicitly, for direct low level control call binary_deserialize::parser::parse_blk