View on GitHub

OCamlverse

Documenting everything about OCaml

Edit

Concurrency, Parallelism, and Distributed Systems

Concurrency refers to running multiple computations and switching from one to the other rapidly (green threads), whereas parallelism refers to using multiple OS-level threads to coordinate computation. Since OCaml 5.0, OCaml supports concurrency with the Effect system and parallelism with Domains.

Concurrency

Non-monadic

As of OCaml 5, OCaml’s effect system can be leveraged to handle concurrency instead of using monads. This allows for cleaner, easier to read code.

  • eio: A concurrency library using the OCaml effect system. (OCaml 5+) Also includes a capability-based security feature.
  • miou Miou is a simple alternative scheduler for OCaml 5+ to run concurrent and/or parallel tasks.
  • MoonPool: Rather than use effects for concurrency, moonpool leverages OCaml threads on top of domains for simplicity.
  • picos: An effort to unify schedulers, so that they multiple OCaml schedulers can work with each other.
  • Riot: Riot is an in-development library to support actor-based processing (similar to Erlang) on OCaml 5.0.

Monadic

These still work in OCaml 5+, but are also compatible with OCaml 4. They generally involve more complex syntax and handling monads, but give you better type guarantees on concurrency semantics.

  • lwt: a monadic concurrency library. Concurrent code uses monads to express the higher-level abstractions of control flow.
  • Async: another monadic concurrency library developed by Jane Street. This library is covered in Real World OCaml. While the concept is very similar to lwt, small discrepancies make compatibility between the libraries difficult.

Event Loops

  • LUV: Bindings to libuv, an event loop-based system that runs node.io. This is also a replacement for the Unix module, allowing for full process control in a system-independent manner.

Articles

Parallelism

Domain (thread)-based Parallelism

OCaml 5.0 introduced domains, which roughly map to the number of cores in a CPU. They allow for true parallelism in OCaml.

  • Parallel Programming in Multicore OCaml: great article on using the OCaml’s multicore capabilities.
  • domainslib: Library for leveraging parallel execution, with work stealing queues.
  • kCAS: Software-Transactional Memory (STM) in OCaml. STM allows for programming across threads (domains) via lockless data structures and interfaces that make the difficult work of parallelism easier for average programmers.
  • MoonPool: Thread pools with work-stealing for domains.

Process-Level Parallelism

Pre-5.0, OCaml supported parallelism only by running multiple processes. This option still exists and is supported by many libraries.

  • Parmap: Provides easy-to-use parallel map and fold functions. The library makes use of forking to create short-lived child processes, and memory mapping to feed the data back to the parent process.
  • Parany: Generalized map reduce for multicore computers (unfold, map in parallel, fold). Parany can process in parallel an “infinite” stream of elements (too big to fit in memory). Any Parmap functionality can be reimplemented using parany.
  • hack-parallel: Parallel processing library using shared memory. Used by Facebook’s Hack.
  • lwt-parallel: Lower level mechanism to create child processes in lwt and have it communicate with the parent via socket.
  • ForkWork: Similar to Parmap above.
  • By interfacing with external C code through the FFI, OCaml can pass off long-running computations to C threads running at the same time as OCaml code. This is made easier nowadays due to CTypes (see ffi)
  • Nproc: A process pool implementation for OCaml using lwt. Rather than creating or forking processes as needed, preallocates them and sends them units of work as required.
  • Ocamlnet: An enhanced system platform library. It contains the netmulticore library to compute tasks on as many cores of the machine as needed. This is the most powerful implementation of parellelism currently available for OCaml, as it is capable of creating a shared memory region, and running a custom-made garbage collector on said region.

Distributed Computing

Distributed computing is similar to process-based parallelism, except that the child processes may or may not be on remote machines. Therefore, distributed computing libraries generally also perform parallelism on the same machine as well.

  • Rpc.Parallel: a library for spawning processes on a cluster of machines, and passing typed messages between them.
  • zmq: ZeroMQ an open-source universal messaging library.
  • Functory: a distributed computing library which facilitates distributed execution of parallelizable computations in a seamless fashion.
  • MPI: message Passing Interface bindings for OCaml.
  • ocaml-rpc: light library to deal with RPCs in OCaml.
  • distributed: Library for distributed computation in OCaml.

Similar to Erlang’s model and inspired by Cloud Haskell.

  • reactor (alpha): Actor model for OCaml, similar to Erlang Elixir.