File Names and Signatures¶
The recommended extension for a Dudley layout file is “.dud”. These files must be UTF-8 encoded if they contain non-ASCII characters (which is only possible in quoted names). A standalone “.dud” file is generally intended to be a template describing many (or at least several) binary data files or streams.
A Dudley layout may describe the contents of any other file format (HDF5, netCDF, PDB, FITS, etc.). However, Dudley also defines its own native self-describing binary file format. The preferred extension for a native Dudley binary file is “.bd” (for binary data). The layout text may be appended at the end of a “.bd” file to make a single self-describing file like an HDF5 file, or the contents of the “.bd” file may be described by a separate “.dud” file.
In either case, a “.bd” file begins with one of two eight byte signatures:
8d < B D 0d 0a 1a 0a |
(8d 3c 42 44 0d 0a 1a 0a) |
8d > B D 0d 0a 1a 0a |
(8d 3e 42 44 0d 0a 1a 0a) |
This was inspired by the PNG header. The rationale is that non-binary FTP file transfers will corrupt either the 0d 0a sequence or the 0a character, while the 1a character stops terminal output on MSDOS (and maybe Windows). The 8d character is chosen because it is illegal as the first character of a UTF-8 stream, it is not defined in the CP-1252 character encoding, not printable in the latin-1 encoding, and finally any file transfer which resets the top bit to zero will corrupt it.
The < variant makes the default byte order little endian (least significant byte first) while the > variant makes the default byte order big endian. Any multi-byte primitive types declared as | (or unprefixed) in the layout will have this default byte order in this specific “.bd” file.
Furthermore, the second eight bytes of a native file are either all zero, or the address of the layout appended to the end of the binary file, in the byte order specified by the < or > character in the first eight bytes. This address will also become the first byte of any data appended to the file if it is subsequently extended. Negative values are reserved for internal use by Dudley involving multiple streams being packed into a single file.
In a native “.bd” file with this signature, the common offset for all addresses (explicit or inferred) is 16 bytes. That is, address 0 of the layout is the byte following the signature.
Note that Dudley layouts describing files in other formats will generally only apply to a single file (since no other format is made to describe multiple files), and will also be normally be generated by a converter program rather than by a human. Such layouts will generally never use indeterminate primitives (| or unprefixed), instead specifying the particular byte order in the single file they describe.