SectorC: An Amazing 512-Byte C Compiler for Efficient Programming

5/5 - (15 votes)

Maybe you have come across this incredible project called SectorC?

It is a C compiler written in x86-16 assembly that fits in the 512-byte boot sector of an x86 machine. If you think it’s already impressive enough, wait until you hear about all the features it has!

SectorC supports a subset of C that is large enough to write really interesting programs. It includes global variables, functions, if and while statements, many operators, pointers, comments, and more. In addition, it is probably the tiniest C compiler ever written.

Encoded in base64, this is what SectorC looks like:

6gUAwAdoADAfaAAgBzH/6DABPfQYdQXoJQHr8+gjAVOJP+gSALDDqluB+9lQdeAG/zdoAEAfy+gI
AegFAYnYg/hNdFuE9nQNsOiqiwcp+IPoAqvr4j3/FXUG6OUAquvXPVgYdQXoJgDrGj0C2nUGV+gb
AOsF6CgA68Ow6apYKfiD6AKrifgp8CaJRP7rrOg4ALiFwKu4D4Srq1fonP9ewz2N/HUV6JoA6BkA
ieu4iQRQuIs26IAAWKvD6AcAieu4iQbrc4nd6HkA6HYA6DgAHg4fvq8Bra052HQGhcB19h/DrVCw
UKroWQDoGwC4WZGrW4D/wHUMuDnIq7i4AKu4AA+ridirH8M9jfx1COgzALiLBOucg/j4dQXorf/r
JIP49nUI6BwAuI0G6wyE0nQFsLiq6wa4iwarAduJ2KvrA+gAAOhLADwgfvkx2zHJPDkPnsI8IH4S
weEIiMFr2wqD6DABw+gqAOvqicg9Ly90Dj0qL3QSPSkoD5TGidjD6BAAPAp1+eu86Ln/g/jDdfjr
slIx9osEMQQ8O3QUuAACMdLNFIDkgHX0PDt1BIkEMcBaw/v/A8H9/yvB+v/34fb/I8FMAAvBLgAz
wYQA0+CaANP4jwCUwHf/lcAMAJzADgCfwIUAnsCZAJ3AAAAAAAAAAAAAAAAAAAAAAAAAAAAAVao=

Xorvoid, the author of this project, was inspired by the research of Justine Tunney and Tom Murphy. To reduce the size of the tokenizer, the author proposed a version of C called the “Barely C Programming Language,” which uses “mega-tokens” to minimize the number of tokens needed.

Here is the grammar of Barely C:

program     = (var_decl | func_decl)+
var_decl    = "int" identifier ";"
func_decl   = "void" func_name "{" statement* "}"
func_name   = <identifier that ends in "()" with no space>
statement   = "if(" expr "){" statement* "}"
            | "while(" expr "){" statement* "}"
            | "asm" integer ";"
            | func_name ";"
            | assign_expr ";"
assign_expr = deref? identifier "=" expr
deref       = "*(int*)"
expr        = unary (op unary)?
unary       = deref identifier
            | "&" identifier
            | "(" expr ")"
            | indentifier
            | integer
op          = "+" | "-" | "&" | "|" | "^" | "<<" | ">>"
            | "==" | "!=" | "<" | ">" | "<=" | ">="

Additionally, the atoi() function is used as a hash function for integer literals, which further reduces the number of tokens.

To successfully minimize the size of its assembly language code, Xorvoid experimented with different methods, such as “byte-threaded code,” which involves aligning addresses on a 2-byte boundary to use a single byte for addressing. Eventually, the creator of SectorC managed to minimize the code as much as possible by utilizing various tricks and constantly simplifying its structure.

In the end, this small project demonstrates that even with significant constraints, impressive results can be achieved. SectorC can be useful for those who want to explore x86-16 BIOS functions and the x86 machine model without having to learn a lot of assembly language first. Such impressive challenges often lead to creative and innovative solutions, leaving technical enthusiasts like me in awe.

If you’re curious and want to learn more about this surprising project, I invite you to discover SectorC here: https://xorvoid.com/sectorc.html

Charles F Flores

With over three years of in-depth experience working in technical fields, Charles is a master content writer who loves writing about Linux and Mac at Easy Tech Tutorials.

Leave a Comment