Interoperability of Elixir
A critical aspect of a programming language lies in its interoperability with other programming languages – being able to play nice with others. Whether it is to reuse legacy code or gain better performance with numerical computations, interoperating Elixir with C is a common practice [✥]. The two most popular ways for doing that is either by working with NIF‘s or with ports, using Porcelain.
NIF’s originated from Erlang/OTP R13B03 [♠]. NIF’s are Erlang/Elixir functions written in C, loaded dynamically as a shared library; whereas Ports are separate programs which run separately from the BEAM VM and communicates with the latter via STDIN/STDOUT. NIF’s tend to be simpler to write because they do not have to be concerned about encoding and decoding standard input and outputs, in certain scenarios, this advantage also makes them more efficient. However a segmentation fault in the C code implementing the NIF’s can crash the BEAM VM, making Ports a safer choice. [◆, ♣]
In this tutorial we will look at how to implement NIF’s for our C library of choice Libpostal (a C library that does parsing and normalization of global street addresses.). If you want to jump into the code right away, here is the link to the full project https://github.com/SweetIQ/expostal .
Creating NIF’s for Libpostal
We can start by creating a new Elixir project using mix.
1 | mix new expostal |
Currently, the recommended way of working with C NIF’s in Elixir is to create Makefiles which get invoked by mix compile
.
Project Setup
To make mix compile
compiles the C NIF’s, we can add the following module definition to mix.exs
:
1 | defmodule Mix.Tasks.Compile.Libpostal do |
Depending on the C library you want to interoperate with and the platform you develop and deploy on, you might need multiple Makefile’s, each targeting different operating systems. In our case, since Libpostal does not run on Windows, we print a warning and exit the program.
Makefile
Next, we can create our Makefile which compiles the NIF’s defined in src/parser.c
and src/expand.c
into priv/parser.so
and priv/expand.so
respectively. With normal C libraries, we most likely only need to put everything inside a single dynamic library (i.e. just priv/your_library.so
). But in our case, since Libpostal’s expand function requires loading machine learning model that is not required by the parser function, it is best to keep them separate.)
1 | MIX = mix |
If the C library you are working with is not installed system-wide (i.e. under /usr/local
or /usr
), or if you’d like to embedded the C library within your project, check out how hoedown project embeds its C dependency.
Deciding whether to embed the C library or require a system-wide installation is a controversial design decision. [Ω] From the developer of the Node.JS binding for Libpostal:
Usually when dynamically linking to a native library, it’s either assumed that the library is installed separately, or that the dependency is included with the binding. Let’s call these the “lean repo” and the “fat repo” approaches respectively. node-postal is an example of a lean repo, whereas a fat repo would be something like node-snappy.
libpostal is a bit trickier than a library like Snappy because it’s not just software - there are also data/model files which need to be downloaded from the web…
I felt that the same argument can be applied to this Elixir binding.
Implementing NIF’s
We’ve finished setting up the build process, it’s time that we actually implement those Native Implemented Functions. For Libpostal parser, our goal is to create an Elixir/Erlang function that calls libpostal_parse_address
from the Libpostal C library. The signature of libpostal_parse_address
is as of the following:
1 | typedef struct libpostal_address_parser_response { |
When passed in an address, libpostal_parse_address
returns the address components as a libpostal_address_parser_response_t
structure. For example, when passed in 845 Sherbrooke St W, Montreal, QC H3A 0G4 as address and together with default options, the function returns:
1 | num_components: 5, |
This is not a very Elixir-esque way of returning values. In Elixir, we can elegantly use a Map
type to represent the label-component key-values. We will see how we can do that later.
Load and unloading
In order for the BEAM VM to interact with C functions, we need to register them with the VM. The src/parser.c
file starts with the following structure:
1 |
|
A dynamic library implementing NIF’s needs to registers itself via the ERL_NIF_INIT
macro, providing its namespace, functions to expose and series of function (load, reload upgrade, unload) that defines the life cycle of the NIF library. [δ]
We are particularity interested by the load
and unload
function. When the parser NIF library loads, we need to initialize Libpostal to load a machine learning model shared by process-local threads. We do that by calling libpostal_setup
and libpostal_setup_parser
functions.
1 | static int |
Similarity, we want to make sure to properly free up resource when the Erlang VM decides to unload the module.
1 | static void |
The reload and upgrade functions are implemented as the following:
1 | static int |
Implementing parse_address function as NIF
Next, we can finally implement the parse_address
function, if you remember seeing previously, libpostal_parse_address
takes as input an address string and emit a custom struct that defines the components and labels. When a user calls parse_address
in Elixir, we need to call libpostal_parse_address
under the hood. Except in this case, when parse_address is called, the input is not a C char*
, but an Elixir string. We need to cast this Elixir string into a C char pointer and then pass it into libpostal_parse_address
. The output of libpostal_parse_address
is a C struct, but we want to output it as a Elixir/Erlang Map, so that the user can enjoy the elegancy of a modern programming language.
Enough said, here’s a carefully commented implementation:
1 | parse_address(ErlNifEnv *env, int argc, const ERL_NIF_TERM argv[]) |
Implementing parse_address function in Elixir
Now we have the NIF implemented, we need to create its Elixir counter part. The Elixir module needs to load the NIF as it initializes, and define the signature of the NIF function. As shown below:
1 | defmodule Expostal.Parser do |
And there we go, libpostal’s parse_address
function can now be invoked inside Elixir:
1 | iex> Expostal.Parser.parse_address("845 Sherbrooke St W, Montreal, QC H3A 0G4") |
Summary
This tutorial covered how we can create Elixir NIF’s from scratch using Expostal (an Elixir binding for Libpostal) as an example. NIF serves as a bridge between C code and Elixir code, allowing you to call C functions inside Elixir. The steps to create an Elixir NIF is as the following:
- Create a new project and setup Mix compile task.
- Create Makefile (or multiple of them, if supporting multiple OS is required)
- Implement the NIF’s in C
- Implement the Elixir module counterpart (init and function definitions)
The entire experience is not that much different from implementing a binding for Python or for Node.JS. But as Elixir/Erlang is a language that supports concurrent programming by design, one must pay more attention to the thread-safety aspects of the implementation when implementing NIF’s. This is a challenge that Node.JS binding implementors do not have to worry, because of its single-threaded design.
If you wish to download the full source code, it is available on Github: https://github.com/SweetIQ/expostal . And star the project while you are at it!