View on GitHub


Documenting everything about OCaml



The OCaml compiler is a complicated piece of software. Below is an attempt to document information that could not fit easily into the codebase, including relevant papers. Feel free to break this up into pages as the need arises.


Interesting Branches of the Compiler


  • ocamlc -config: show all configuration parameters for the compiler. Very useful.


See also Runtime


Compiler Internals

  • hacking.adoc: a basic guide to the compiler’s internals.


The compiler driver, residing in the /driver directory, runs the entire compilation process from start to finish. The 2 entry points into the system are for the native compiler and for the bytecode compiler, setting up 2 separate execution paths through the code.

Both paths go through the file, which handles PPX rewriters. This file dumps the current parsed AST, calls a given PPX executable, and reloads the resulting AST.

The two compilation files are for bytecode, and for native. Both files pipe the different kinds of data through all the compilation stages. While native compilation has options for either clambda (naive) or flambda (optimized) compilation, bytecode compilation currently has only one mode, which is equivalent to clambda compilation.


The parser converts OCaml syntax to an abstract sytnax tree (AST) representation (parsing/parsetree.mli).


PPX rewriters are separate executables that parse binary AST, modify certain parts as needed, and spit out binary AST for the compiler to reload.


The typechecker transforms the plain AST to typechecked AST (typing/typedtree.mli).


After typechecking, if a program isn’t rejected, types are mostly erased from the AST except information relevant to optimizations. The resulting AST (lambda/lambda.mli) is leaner and easier to manipulate than the typed AST.

Pattern Matching

Pattern matching uses a fairly complex algorithm (lambda/matching.mli) to convert potentially complex patterns into simpler, efficient AST.


Flambda is an optional, additional layer of optimization, residing in /middle_end.


Clambda is an expansion of the Lambda AST. It also includes some more low-level concerns, such as explicit closures.


cmm is an extremely low-level language, concerning itself with machine language (Assembly) and its optimization. At this level, the original high level OCaml code is hard to recognize.

Register Coloring


The actual machine code ultimately produced by the native compiler.