A Guide on Generating Erlang Forms

2023-04-21

An advanced feature of Erlang is to write code that generates a module. This feature is possible because the Erlang standard library exposes functions that work with Erlang AST. Generating forms is different from Elixir, Scheme, or Clojure macros. However, honing this skill allows Erlang developers to generate code they don’t have to write themselves. This technique is in erlydtl templates, parse_transform, etc. This feature is powerful but has a learning curve. For example, while thorough, the docs on Erlang’s abstract forms are opaque. This blog post will break it down and make it easier to understand.

This post will be referencing modules in the Erlang standard library. While not necessary, it is helpful to look through these modules for more context.

The modules used are:

The hardest part with Erlang forms is knowing what the form is supposed to look like. Erlang developers think in terms of Erlang code. However, to use this technique, they need to express the Erlang code in forms. We can use a trick to reveal the abstract form for whatever Erlang code we want.

The Trick

Start by writing a throwaway module. You won’t need to include this in your final project; use it to test things in the shell. The module looks like this:

-module(forms).

-export([parse_exp/1, parse_form/1]).

parse_exp(Exp) ->
    {ok, Tokens, _} = erl_scan:string(Exp),
    {ok, Parsed} = erl_parse:parse_exprs(Tokens),
    Parsed.

parse_form(S) ->
    {ok, Tokens, _} = erl_scan:string(S),
    {ok, Parsed} = erl_parse:parse_form(Tokens),
    Parsed.

This post takes code from another blog post. So big shout out!

The gist of this module is that we will be able to write Erlang code in a string and then parse it into Erlang forms. This trick is helpful because thinking about the generated code in plain Erlang rather than Erlang forms is more manageable.

erl_scan:string tokenizes the Erlang code. That output gets fed into erl_parse:parse_exp(Tokens). This code will be no surprise if you are familiar with programming language implementations. The first pass is the tokenizer, and the second is the parser which turns the tokens into an AST.

Parsing Expressions

In the Erlang shell we can test out parse_exp:

forms:parse_exp("1.").
%% => [{integer,1,1}]

forms:parse_exp("<<\"hello world\">>.").
%% => [{bin,1,
%%          [{bin_element,1,{string,1,"hello world"},default,default}]}]

forms:parse_exp("hello").
%% => [{atom,1,hello}]

The period inside the quotes is necessary to denote the end of the expression.

In the above code, we tested what basic expressions look like in Erlang’s Abstract Term Format.

Notice the ‘1’ as the second element in the tuple. That is simply the line number. When generating these forms, it is acceptable to set that number to ‘0’. The expression tends to fall under this structure:

{type, line_number, data}

There are exceptions to this rule, as we saw with a binary. So make sure you are testing each expression to see the underlying structure.

Now that we’ve explored expressions, we can move on to forms. Forms fall under two categories: attributes and functions. Forms are things like functions or modules. In contrast, an expression is data, a lambda, or a case expression. Some forms, like functions, have expressions in their body. Later, we will see how a module is a list of forms.

Parsing Forms

So now we parse an Erlang function with parse_form.

forms:parse_form("foobar() -> hello.").
%% => {function,1,foobar,0,[{clause,1,[],[],[{atom,1,hello}]}]}

This specific form is a function with no arguments that returns the atom ‘hello’.

Let’s do a function with parameters to take it up a notch. So we can see how our parse_form function can reveal what that looks like in a form.

We will turn this function:

add(A, B) -> A + B.

into a form:

forms:parse_form("add(A, B) -> A + B.").

%% => {function,1,add,2,
%%          [{clause,1,
%%                   [{var,1,'A'},{var,1,'B'}],
%%                   [],
%%                   [{op,1,'+',{var,1,'A'},{var,1,'B'}}]}]}

Notice the ‘2’ after the ‘add’, which represents a function with an arity of 2. We also have a new type of form, var. That represents the function parameters ‘A’ and ‘B’. It is in the clause’s body and the params list for the clause. No matter how complicated, we can turn Erlang code into a form.

forms:parse_form works on attributes as well:

forms:parse_form("-export([foobar/0]).").
%% => {attribute,1,export,[{foobar,0}]}

forms:parse_form("-module(hello).").
%% => {attribute,1,module,hello}

These attribute forms have a subtype of export and module, respectively. The data with the export attribute is a list of tuples. The tuples contain the function name and the function arity. The data for the module attribute is simply an atom representing the module name.

Compiling Forms

We can combine these forms to feed into compile:forms and generate a module. To do so, we create a list of forms that will constitute our module. At a minimum, we need the module attribute, the export attribute, and the function we will call.

Mod = [{attribute, 0, module, hello},
       {attribute, 0, export, [{foobar, 0}]},
       {function,0,foobar,0,[{clause,0,[],[],[{atom,0,hello}]}]}].

{ok, hello, Bin} = compile:forms(Mod, [debug_info]).

{module, hello} = code:load_binary(hello, "hello.beam", Bin).

hello =:= hello:foobar().
%% => True

Let’s break this down.

Mod is the variable to store the list of forms and is used as input into compile:forms.

We pass in the Mod variable into compile:forms, and we provide the option debug_info. You don’t need the debug_info option if you generate code and load it all at runtime. However, you will want this option set if you generate this code and then write the binary to a file. The reason is that Dialyzer will need that debug_info to analyze the module like any other Erlang module. Writing out modules into the ebin directory is how several rebar3 plugins work.

Let’s go back to the above code. The binary BEAM opcodes get loaded into the runtime via code:load_binary. The first argument is an atom for the module name. The second argument is a BEAM file name as a string. It mustn’t be a binary, or else it will fail. The filename should be the module name followed by the .beam extension. The third and final argument is the actual BEAM code binary.

Once the module gets loaded into the runtime, we can call our function hello:foobar() from our generated module.

It is as easy as that! So now, if you are trying to be fancy with generating Erlang forms, I hope you remember these tricks to speed up development. Remember, each expression and form has a unique structure, so exploring them with this trick is helpful.

With this skill under your belt, you will be on your way to being a true Erlang ninja!

Enter your instance's address