I recently implemented BigObj COFF file parsing in cgo (golang/go#24341). In the process, I quickly discovered that Microsoft doesn’t document the binary format anywhere. Their official documentation is the only reference they have to BigObj as far as I can tell, and it doesn’t say anything about the binary format. I didn’t see any other blogs or resources covering this topic either.
I figured it out by reading binutils and LLVM source code, so I’m documenting what I learned while the knowledge is still fresh in my memory.
I’m assuming readers are already familiar with COFF and how it’s structured. This post focuses specifically on how BigObj differs from regular COFF object files.
Header Structure
The first difference is the file header. Regular COFF files start with an IMAGE_FILE_HEADER
, but BigObj files use a completely different header structure called ANON_OBJECT_HEADER_BIGOBJ
(defined in winnt.h
).
Regular COFF Header (IMAGE_FILE_HEADER)
typedef struct {
WORD Machine;
WORD NumberOfSections; // 16-bit
DWORD TimeDateStamp;
DWORD PointerToSymbolTable;
DWORD NumberOfSymbols;
WORD SizeOfOptionalHeader;
WORD Characteristics;
} IMAGE_FILE_HEADER;
BigObj Header (ANON_OBJECT_HEADER_BIGOBJ)
typedef struct {
WORD Sig1; // Must be 0x0
WORD Sig2; // Must be 0xFFFF
WORD Version; // Currently 2
WORD Machine;
DWORD TimeDateStamp;
BYTE ClassID[16]; // Magic bytes that identify this as bigobj format
DWORD SizeOfData;
DWORD Flags;
DWORD MetaDataSize;
DWORD MetaDataOffset;
DWORD NumberOfSections; // 32-bit field (16-bit in regular COFF)
DWORD PointerToSymbolTable;
DWORD NumberOfSymbols;
} ANON_OBJECT_HEADER_BIGOBJ;
Differences
Comparing the two headers reveals several differences:
- Supports more sections:
NumberOfSections
expands from 16-bit to 32-bit. Obviously, this is the entire point of the format! - Format identification: BigObj adds
Sig1
,Sig2
,Version
, andClassID
. I will explain these later. - Removed optional header: No
SizeOfOptionalHeader
field (the optional header is only applicable to executables anyway) - Removed characteristics: No
Characteristics
field (also only applicable to executables) - Additional metadata??: BigObj adds
SizeOfData
,Flags
,MetaDataSize
, andMetaDataOffset
fields. In practice, these are unused (every toolchain I checked sets them to zero). I am not sure what they are for.
Detection
To detect whether a file is BigObj format, you need to check the following:
A valid BigObj file will always have these properties:
Sig1
=0x0000
andSig2
=0xFFFF
Version
is always2
. Not sure if this needs to be checked.ClassID
must match these magic bytes:{0xC7, 0xA1, 0xBA, 0xD1, 0xEE, 0xBA, 0xA9, 0x4B, 0xAF, 0x20, 0xFA, 0xF6, 0x6A, 0xA4, 0xDC, 0xB8}
Symbol Structure
BigObj also uses a different symbol format. While regular COFF symbols are 18 bytes each, BigObj symbols are 20 bytes, due to the 32-bit SectionNumber field.
Regular COFF Symbol
typedef struct {
BYTE Name[8];
DWORD Value;
WORD SectionNumber; // 16-bit
WORD Type;
BYTE StorageClass;
BYTE NumberOfAuxSymbols;
} IMAGE_SYMBOL;
BigObj Symbol
typedef struct {
BYTE Name[8];
DWORD Value;
DWORD SectionNumber; // 32-bit!
WORD Type;
BYTE StorageClass;
BYTE NumberOfAuxSymbols;
} IMAGE_SYMBOL_EX;
You need to take this into account when indexing and reading the symbol table.
Symbol Table Parsing
The Name
field of the symbol is actually a union.
You can conceptually think of it like this:
typedef union {
BYTE ShortName[8]; // Name <= 8 chars: stored directly
struct {
DWORD Zeroes; // 0x00000000 indicates long name
DWORD Offset; // Offset into string table
} LongName;
} SYMBOL_NAME;
If the name of the symbol is more than 8 bytes, it is instead stored in the string table, and the latter 4 bytes of Name are an offset into the string table. This applies to both regular COFF and BigObj.
The string table immediately follows the symbol table, so you need to take into account the symbol size to locate it:
// Calculate string table location
DWORD stringTableStart;
if (isBigObj) {
// BigObj: sizeof(IMAGE_SYMBOL_EX) = 20 bytes per symbol
stringTableStart = header->PointerToSymbolTable +
(sizeof(IMAGE_SYMBOL_EX) * header->NumberOfSymbols);
} else {
// Regular COFF: sizeof(IMAGE_SYMBOL) = 18 bytes per symbol
stringTableStart = header->PointerToSymbolTable +
(sizeof(IMAGE_SYMBOL) * header->NumberOfSymbols);
}
The first 4 bytes of the string table are its total size in bytes (including those initial 4 bytes).
Conclusion
The key changes that BigObj makes are minimal:
- A different header structure
- 32-bit section counts instead of 16-bit
- 32-bit symbol section numbers instead of 16-bit
- 20-byte symbols instead of 18-byte symbols
Everything else works exactly like regular COFF. Implementing BigObj support is relatively straightforward once you understand these differences.
Hopefully this saves someone else from having to dig through binutils and LLVM source like I did!