The OCaml compiler is a complicated piece of software. Below is an attempt to document information that could not fit easily into the codebase, including relevant papers. Feel free to break this up into pages as the need arises.
- Runtime memory layout from the Real World OCaml book
- Presentation on OCaml Internals (pdf): excellent presentation on how OCaml is built on the inside.
- Exception handling in OCaml: a StackOverflow answer detailing how exceptions work in OCaml.
- The OCaml compiler pipeline
Interesting Branches of the Compiler
ocamlc -config: show all configuration parameters for the compiler. Very useful.
- hacking.adoc: a basic guide to the compiler’s internals.
The compiler driver, residing in the /driver directory, runs the entire compilation process from start to finish. The 2 entry points into the system are optmain.ml for the native compiler and main.ml for the bytecode compiler, setting up 2 separate execution paths through the code.
Both paths go through the pparse.ml file, which handles PPX rewriters. This file dumps the current parsed AST, calls a given PPX executable, and reloads the resulting AST.
The two compilation files are compile.ml
for bytecode, and optcompile.ml for native.
Both files pipe the different kinds of data through all the compilation stages.
While native compilation has options for
clambda (naive) or
flambda (optimized) compilation, bytecode compilation currently has only
one mode, which is equivalent to
The parser converts OCaml syntax to an abstract sytnax tree (AST) representation (parsing/parsetree.mli).
PPX rewriters are separate executables that parse binary AST, modify certain parts as needed, and spit out binary AST for the compiler to reload.
The typechecker transforms the plain AST to typechecked AST (typing/typedtree.mli).
After typechecking, if a program isn’t rejected, types are mostly erased from the AST except information relevant to optimizations. The resulting AST (lambda/lambda.mli) is leaner and easier to manipulate than the typed AST.
Pattern matching uses a fairly complex algorithm (lambda/matching.mli) to convert potentially complex patterns into simpler, efficient AST.
Flambda is an optional, additional layer of optimization, residing in /middle_end.
Clambda is an expansion of the Lambda AST. It also includes some more low-level concerns, such as explicit closures.
cmm is an extremely low-level language, concerning itself with machine language (Assembly) and its optimization. At this level, the original high level OCaml code is hard to recognize.
The actual machine code ultimately produced by the native compiler.