Product
Docs
Resources
Log InSign Up

Want to learn more about Courier?

Request a demo to see how Courier works to make product notifications delightful.

wasm-code-generation-header
ENGINEERING

Build a WebAssembly Language for Fun and Profit: Code Generation

Andrew Youngwerth

September 01, 2022

The final phase of our compiler is code generation. This phase takes the AST and converts it to a set of executable instructions. In our case, WebAssembly. To accomplish this, we are going to use a popular WebAssembly compiler toolchain called binaryen.

Binaryen does most of the heavy compiling tasks for us. Tricky optimizations like dead-code removal, code folding, and more are all handled out of the box. Thanks to binaryen, we can make a powerful language with very little code.

This is the third article in the Build a programming language series. If you’re just starting out, start with the first post here before continuing on.

The Compile function

One of the trickiest tasks in code generation is what to call the code generation function. For this article, I have opted to call the main function compile. The compile function will take our root AST (a BlockNode) and return a new binaryen.Module. The binaryen.Module exposes functions that can optimize and emit wasm in various formats.

Here's the outline:

1
// src/compiler.mts
2
import binaryen from "binaryen";
3
import { AstNode, BlockNode, IdentifierNode } from "./types/index.mjs";
4
5
export const compile = (block: BlockNode): binaryen.Module => {
6
// Creates a new binaryen module that our helper functions will fill in
7
const mod = new binaryen.Module();
8
9
// The function map is used to track all the functions and their types. More on this later
10
const functionMap = generateFunctionMap(block);
11
12
// This function registers all the standard library functions we'll include with our language.
13
// This includes functions like add, subtract, etc.
14
registerStandardFunctions(mod, functionMap);
15
16
// This is where the magic happens. Because `BlockNode` is an expression, this
17
// function can recursively compile every instruction in a wispy program file
18
compileExpression({
19
expression: block,
20
mod,
21
functionMap,
22
parameters: new Map(),
23
});
24
25
// Finally, we return the binaryen module
26
return mod;
27
};

Generating A Function Map

Next we define the generateFunctionMap function. This function crawls the entire expression tree to find and register function definitions. Its important we do this before actually compiling the functions as some functions may call other functions before they've been defined.

The return type of generateFunctionMap is a map where the key is the function name and the value is an object containing all the important information about the function the compiler needs to know about. For now, all we need is returnType.

Here’s the return type definition:

1
// src/compiler.mts
2
type FunctionMap = Map<string, { returnType: number }>;

Now that we have defined our return type, we can define the actual function:

1
// src/compiler.mts
2
const generateFunctionMap = (block: BlockNode): FunctionMap => {
3
// Preview the first node (i.e. expression) of the block
4
const firstNode = block.expressions[0];
5
6
// If the first node is an identifier and the identifier is "fn", then we know this block represents a function definition.
7
if (isNodeType(firstNode, "identifier") && firstNode.identifier === "fn") {
8
// Grab the function identifier / name, and it's return type. This is the second expression of
9
// the function definition, a typed identifier.
10
const { identifier, returnType } = getFunctionIdentifier(block); // We'll define this next
11
12
// Return the function map
13
return new Map([
14
[identifier, { returnType }],
15
16
// It's possible this function may contain function definitions inside of it. So we
17
// We put all the remaining expressions of the function into a new block and scan it
18
// then we merge the resulting map with this one.
19
...generateFunctionMap({ type: "block", expressions: block.expressions.slice(3) }),
20
]);
21
}
22
23
// A block can contain multiple expressions. So, we must scan each one to see if it is a function
24
// definition. The root `BlockNode` for instance, will almost always have multiple functions.
25
return block.expressions.reduce((map, expression) => {
26
// Only block expressions can be functions
27
if (expression.type === "block") {
28
return new Map([...map, ...generateFunctionMap(expression)]);
29
}
30
31
// We can ignore all other expression
32
return map;
33
}, new Map());
34
};

Onto getFunctionIdentifier we called earlier. This function is simple. It takes a BlockNode, ensures that the second identifier is a TypedIdentifierNode, and then returns the identifier and return type:

1
// src/compiler.mts
2
const getFunctionIdentifier = (block: BlockNode) => {
3
// Grab the second expression
4
const node = block.expressions[1];
5
6
// Ensure the expression is a typed identifier
7
if (!isNodeType(node, "typed-identifier")) {
8
throw new Error("Expected typed function name");
9
}
10
11
return {
12
identifier: node.identifier,
13
14
// We have to map the return type to a type binaryen understands.
15
returnType: mapBinaryenType(node.typeIdentifier),
16
};
17
};

As noted in the getFunctionIdentifier function, Binaryen doesn't understand what type the string typeIdentifier is. To handle this we have to map our defined types to binaryen types. For now, we'll just support i32 and f32. Thankfully binaryen uses the same nomenclature we do. So the map function is pretty simple:

1
// src/compiler.mts
2
const mapBinaryenType = (typeIdentifier: string): binaryen.Type => {
3
if (typeIdentifier === "i32") return binaryen.i32;
4
if (typeIdentifier === "f32") return binaryen.f32;
5
throw new Error(`Unsupported type ${typeIdentifier}`);
6
};

getFunctionIdentifier Made a call to a new function, isNodeType. This function is essentially the same concept as the isTokenType function we defined in the parsing article, only for ASTNode instead of Token.

Here's the definition:

1
// src/compiler.mts
2
export const isNodeType = <T extends AstNode["type"]>(
3
item: unknown,
4
type: T
5
): item is Extract<AstNode, { type: T }> => {
6
return (
7
// Ensure the type exists
8
!!item &&
9
// Ensure the type is an object
10
typeof item === "object" &&
11
// Cast the type as a record, so TypeScript doesn't get mad at us and then compare the
12
// type field with the type parameter. If they are equal, we know the node is the
13
// the type we were looking for.
14
(item as Record<string, unknown>)["type"] === type;
15
)
16
};

With the mapper finished we can start generating some code.

Compiling Expressions

The compileExpression function is where we really start to make use of binaryen to model the generated machine code. Because of the tree structure of an, ahem, abstract syntax tree, compileExpression is highly recursive. This is one of my favorite things about programming languages, their patterns tend to lend themselves to elegant recursive functions with high levels of code re-use.

Let's start with defining the parameters of compileExpression. We will need to pass the binaryen.Module and the functionMap we created earlier, the actual expression we are compiling, and any parameters of the function this expression may be a part of (if it's inside) of a function. When there are more than two parameters of a function it can be difficult to visually keep track of what is what. So I like to make it clear by grouping them together in an object. This enforces labeling the parameters on call and as a result, improves code readability.

Here's the interface of that object:

1
// src/compiler.mts
2
interface CompileExpressionOpts {
3
expression: AstNode;
4
mod: binaryen.Module;
5
functionMap: FunctionMap; // We defined this earlier
6
parameters: ParameterMap; // Defined below.
7
}
8
9
// A map where the key is the parameter identifier and the value is the important information
10
// required by binaryen to fetch the parameter down the line
11
type ParameterMap = Map<string, { index: number; type: number }>;

Now that we have the options for compileExpression defined, we can define the actual function. compileExpression takes CompileExpressionOpts as a parameter and returns a number. The job of this function is to take an expression and determine what type of expression it is, from there it can pass the expression to another compiler function that can handle that specific type of expression.

Why return a number? When we build an expression with binaryen it returns a number as an identifier for that expression. This allows us to compile an expression ahead of time and then reference that expression later down the line.

Here's the definition:

1
// src/compiler.mts
2
const compileExpression = (opts: CompileExpressionOpts): number => {
3
// Grab the expression and the binaryen module (mod) from the options.
4
// The other fields are used by child function calls
5
const { expression, mod } = opts;
6
7
// Map the expression node to it's corresponding specific compiler
8
if (isNodeType(expression, "block")) return compileBlock({ ...opts, expression });
9
10
// Numbers are simple enough to compiler that we can just inline the compiler here.
11
// They are represented as constants
12
if (isNodeType(expression, "int")) return mod.i32.const(expression.value);
13
if (isNodeType(expression, "float")) return mod.f32.const(expression.value);
14
15
if (isNodeType(expression, "identifier")) return compileIdentifier({ ...opts, expression });
16
17
// Throw a helpful error message if we don't recognize the expression
18
throw new Error(`Unrecognized expression ${expression.type}`);
19
};

Let's define the compileBlock functions. Because this function is also compiling an expression, we can re-use the previously defined CompileExpressionOpts, but we'll narrow the expression field to the BlockNode type, since we know we are compiling a block by the time this function is called:

1
interface CompileBlockOpts extends CompileExpressionOpts {
2
expression: BlockNode;
3
}
4
5
const compileBlock = (opts: CompileBlockOpts): number => {
6
// We re-map the expression field to block here for clarity.
7
const { expression: block, mod } = opts;
8
9
// When a block has multiple expressions and the first one is an identifier, that means
10
// the block is actually a function call.
11
if (isNodeType(block.expressions[0], "identifier") && block.expressions.length > 1) {
12
// If it is a function call, transfer responsibility to the `compileFunctionCall` function (defined next)
13
return compileFunctionCall(opts);
14
}
15
16
// This is where the recursive beauty starts to show. Since every value of a block
17
// is an expression, we can map each one back to the compileExpression function.
18
const expressions = block.expressions.map((expression) => {
19
return compileExpression({ ...opts, expression });
20
});
21
22
// Now we generate the machine code by calling the block function of binaryen
23
// This function takes a block name, an array of compiled expressions, and a block return type.
24
// Named blocks are mostly useful for looping constructs like `for` and `while`. In this
25
// case we can pass null as we're not compiling a loop construct. Additionally, we can
26
// pass `auto` as the type since binaryen is smart enough to determine the return type
27
// of blocks automatically.
28
return mod.block(null, expressions, binaryen.auto);
29
};

Note: If you're curious to see how looping works in binaryen / WebAssembly works, check out my blog post on the subject here. Spoiler alert, its pretty weird.

The last simple expression we'll compile in this section is the identifier expression. If compileExpression was passed a lone IdentifierNode it means that the expression evaluates to the value of the identifier. In wispy, we don't have variables and function identifiers are caught before the could've been passed here. That means the only thing IdentifierNode can resolve to is a parameter.

Here's the definition:

1
interface CompileIdentifierOpts extends CompileExpressionOpts {
2
expression: IdentifierNode;
3
}
4
5
const compileIdentifier = (opts: CompileIdentifierOpts): number => {
6
// We remap expression to node to keep our lines a little shorter
7
const { expression: node, parameters, mod } = opts;
8
9
// Since we know the identifier has to be a parameter, we look it up in our
10
// parameter map. Don't worry, we'll define the parameter map in the next section
11
const info = parameters.get(node.identifier);
12
if (!info) {
13
throw new Error(`Unrecognized identifier ${node.identifier}`);
14
}
15
16
// Finally, we use the local.get instruction to return the parameter value.
17
// Binaryen needs to know the parameters index and type. We'll get into
18
// the index when we define our parameter mapping function.
19
return mod.local.get(info.index, info.type);
20
};

The final expression type left to compile is a function call. This is interesting enough to warrant its own section.

Compiling Function Calls

In wispy function calls are blocks with multiple expressions where the first expression is an identifier. The job of compileFunction is to determine which function is being called, what its parameters and return type are, and finally, building the call instruction with binaryen.

Here's the definition:

1
// src/compiler.mts
2
3
// Because function calls are blocks, we can re-use CompileBlockOpts
4
const compileFunctionCall = (opts: CompileBlockOpts): number => {
5
const { expression, functionMap, mod } = opts;
6
7
// The first expression of a function call is the functions identifier
8
const identifierNode = expression.expressions[0];
9
10
// Here we just ensure the identifierNode is *actually* an identifier. Otherwise we throw an error.
11
if (!isNodeType(identifierNode, "identifier")) {
12
throw new Error("Expected identifier when compiling function call");
13
}
14
15
// Next we create a reference to what the actual identifier is
16
const identifier = identifierNode.identifier;
17
18
// If the identifier is "fn", the function we are calling is the function to define functions!
19
// That's right! Functions are created by another function. Pretty neat if you ask me.
20
if (identifier === "fn") return compileFunction(opts); // We'll define this next
21
22
// Ifs are special functions. They may or may not have an else block. Binaryen needs to know
23
// if the else block exists at compile time, so we have a special if compiler for this.
24
if (identifier === "if") return compileIf(opts); // We'll define this later
25
26
// Every other function is either part of the standard library, or is defined
27
// within the wispy code itself.
28
const functionInfo = functionMap.get(identifier);
29
if (!functionInfo) {
30
throw new Error(`Function ${identifier} not found`);
31
}
32
33
const params = expression.expressions
34
// Every other expression in the block are parameters to the function, so we compile them
35
// and then pass them to the call
36
.slice(1)
37
.map((expression) => compileExpression({ ...opts, expression }));
38
39
// Now we use binaryen to construct the call expression. The first parameter
40
// is the functions identifier, the second are the compiled parameter expression,
41
// and the third is the return type which has already been determined by generateFunctionMap
42
return mod.call(identifier, params, functionInfo.returnType);
43
};

Let's define the compileIf function before we move onto the compileFunction... function.

1
// src/compiler.mts
2
3
const compileIf = (opts: CompileBlockOpts): number => {
4
const { expression, mod } = opts;
5
6
// The first expression, expression.expressions[0], is the "if" identifier, we don't need
7
// to do anything with it since we already know we are compiling an if expression
8
9
// The second expression is the if condition
10
const conditionNode = expression.expressions[1];
11
12
// The third expression is the ifTrueNode, it's what is executed if the conditionNode evaluates to
13
// true
14
const ifTrueNode = expression.expressions[2];
15
16
// Finally the fourth expression (which may or may not exist) is what is executed if the condition
17
// evaluates to false
18
const ifFalseNode = expression.expressions[3];
19
20
// Compile the condition expression
21
const condition = compileExpression({ ...opts, expression: conditionNode });
22
23
// Compile the ifTrue Expression
24
const ifTrue = compileExpression({ ...opts, expression: ifTrueNode });
25
26
// Check to see if the ifFalseNode exists, if it does, compile it, otherwise set ifFalse to undefined
27
const ifFalse = ifFalseNode ? compileExpression({ ...opts, expression: ifFalseNode }) : undefined;
28
29
// Finally we use binaryen to compile the if expression
30
return mod.if(condition, ifTrue, ifFalse);
31
};

Compiling Function Definitions

Function definitions are a whole lot like function calls, so the function structure is pretty similar. We take CompileBlockOpts and return a number (the binaryen expression reference).

Here's the definition:

1
// src/compiler.mts
2
3
const compileFunction = (opts: CompileBlockOpts): number => {
4
const { expression: block, mod } = opts;
5
6
// We need to tell binaryen what the identifier and return type of the function is
7
// Thankfully, we already wrote a function for that, getFunctionIdentifier. We
8
// could also have just looked up this information with the functionMap, but
9
// this is more fun.
10
const { identifier, returnType } = getFunctionIdentifier(block);
11
12
// Next we grab the function parameters. This is the third expression of the function
13
const { parameters, parameterTypes } = getFunctionParameters(block); // Defined later
14
15
// The rest of the expressions in the function are the functions block. So we create
16
// a new BlockNode from the remaining expression.
17
const body = compileBlock({
18
...opts,
19
expression: {
20
type: "block",
21
expressions: block.expressions.slice(3),
22
},
23
24
// We need to pass the parameters of this function, so they can be referenced in child
25
// expressions
26
parameters,
27
});
28
29
// Now we register the function with binaryen. Binaryen takes the function identifier,
30
// an array of parameter types (each item being the type of parameter in order),
31
// the function's return type, a list of variable types (wispy doesn't have any, so we pass an empty array)
32
// and finally the compiled body of the function.
33
mod.addFunction(identifier, parameterTypes, returnType, [], body);
34
35
// To make things easy we export every single function defined in a wispy file
36
// so it can be called by the WebAssembly host.
37
mod.addFunctionExport(identifier, identifier);
38
39
// Because function definitions are *technically* expressions that can be a part of another function
40
// body, we need to return an expression pointer. For this, we just return a nop (do nothing instruction),
41
// to make things consistent.
42
return mod.nop();
43
};

Now, let's define the getFunctionParameters function. This function takes the function BlockNode, that is, the entire unmodified function definition, and extracts its parameters. The function returns two values, parameters and parameterTypes.

The first returned value, parameters, is a map where the key is the parameter identifier, and the value is the information needed to access the parameter down the line within the function body.

The second returned value is an array of binaryen types. There is one type for each defined parameter, and they must remain in the order they are defined. This is because binaryen doesn't reference parameters by their names, instead it references them by the index of the array in which they are defined. Don't worry if this is confusing to you, the code should make things a little more clear. If you need, refer to the compileIdentifier definition, to get a better understanding of how this works in practice.

Here's the definition:

1
// src/compiler.mts
2
3
type ParameterMap = Map<string, { index: number; type: number }>;
4
5
const getFunctionParameters = (block: BlockNode) => {
6
// The parameters are defined in the third expression of the function definition
7
const node = block.expressions[2];
8
9
// Check to make sure the third expression is a block
10
if (!isNodeType(node, "block")) {
11
throw new Error("Expected function parameters");
12
}
13
14
// Now we reduce the parameters into a parameter map, and a list of binaryen types
15
const { parameters, types } = node.expressions.reduce(
16
(prev, node, index) => {
17
// First, ensure that the node is a typed-identifier. Every parameter must be a
18
// typed identifier, therefore, every node in this reducer must be a typed identifier.
19
if (!isNodeType(node, "typed-identifier")) {
20
throw new Error("All parameters must be typed");
21
}
22
23
// Determine the correct binaryen type of the parameter
24
const type = mapBinaryenType(node.typeIdentifier);
25
26
// Add the parameter's type to the list of types we've defined so far
27
const types = [type, ...prev.types];
28
29
// Now add the parameter to the parameter map. We save the parameters index and type.
30
// The index and type is used binaryen to access the parameter when it is used
31
// later in the function body
32
const parameters = new Map([[node.identifier, { index, type }], ...prev.parameters]);
33
34
// Return updated parameters map and types array
35
return {
36
parameters,
37
types,
38
};
39
},
40
// Here we are setting the starting values for our reducer function and casting the default
41
// type so typescript can correctly infer the `prev` parameter type
42
{ parameters: new Map(), types: [] } as {
43
parameters: ParameterMap;
44
types: number[];
45
}
46
);
47
48
// Finally we return the parameter map and the parameterTypes
49
return {
50
parameters,
51
52
// Note: parameterTypes is a number, instead of an array of numbers as you'd expect.
53
// So we have to use binaryen.createType to create a new type that is referenced
54
// the mod.addFunction function. This is one inconsistency with the binaryen API. Parameters
55
// are defined as a number, and variables are defined as an array of numbers. I'm sure there
56
// is a reason for this, but I don't know what that reason is.
57
parameterTypes: binaryen.createType(types),
58
};
59
};

Now all that's left is to define the standard library. This part of the code isn't super interesting. We are essentially just mapping primitive WebAssembly instructions to a name to be referenced within wispy.

Here's the definitions. The only important information is the name we are associating with each instruction:

1
// src/compiler.mts
2
3
const registerStandardFunctions = (mod: binaryen.Module, map: FunctionMap) => {
4
const { i32, f32 } = binaryen;
5
const { i32: i32m, f32: f32m } = mod;
6
const common = { mod, map };
7
registerLogicFunction({ name: "lt_i32", type: i32, operator: i32m.lt_s, ...common });
8
registerLogicFunction({ name: "gt_i32", type: i32, operator: i32m.gt_s, ...common });
9
registerLogicFunction({ name: "eq_i32", type: i32, operator: i32m.eq, ...common });
10
registerLogicFunction({ name: "lt_f32", type: f32, operator: f32m.lt, ...common });
11
registerLogicFunction({ name: "gt_f32", type: f32, operator: f32m.gt, ...common });
12
registerLogicFunction({ name: "eq_f32", type: f32, operator: f32m.eq, ...common });
13
registerMathFunction({ name: "add_i32", type: i32, operator: i32m.add, ...common });
14
registerMathFunction({ name: "sub_i32", type: i32, operator: i32m.sub, ...common });
15
registerMathFunction({ name: "mul_i32", type: i32, operator: i32m.mul, ...common });
16
registerMathFunction({ name: "add_f32", type: f32, operator: f32m.add, ...common });
17
registerMathFunction({ name: "sub_f32", type: f32, operator: f32m.sub, ...common });
18
registerMathFunction({ name: "mul_f32", type: f32, operator: f32m.mul, ...common });
19
registerMathFunction({ name: "div_f32", type: f32, operator: f32m.div, ...common });
20
};
21
22
const registerMathFunction = (opts: {
23
mod: binaryen.Module;
24
name: string;
25
type: number;
26
operator: (left: number, right: number) => number;
27
map: FunctionMap;
28
}) => {
29
const { mod, name, type, operator, map } = opts;
30
return registerBinaryFunction({
31
mod,
32
name,
33
paramType: type,
34
returnType: type,
35
operator,
36
map,
37
});
38
};
39
40
const registerLogicFunction = (opts: {
41
mod: binaryen.Module;
42
name: string;
43
type: number;
44
operator: (left: number, right: number) => number;
45
map: FunctionMap;
46
}) => {
47
const { mod, name, type, operator, map } = opts;
48
return registerBinaryFunction({
49
mod,
50
name,
51
paramType: type,
52
returnType: binaryen.i32,
53
operator,
54
map,
55
});
56
};
57
58
const registerBinaryFunction = (opts: {
59
mod: binaryen.Module;
60
name: string;
61
paramType: number;
62
returnType: number;
63
operator: (left: number, right: number) => number;
64
map: FunctionMap;
65
}) => {
66
const { mod, name, paramType, returnType, operator, map } = opts;
67
mod.addFunction(
68
name,
69
binaryen.createType([paramType, paramType]),
70
returnType,
71
[],
72
mod.block(
73
null,
74
[operator(mod.local.get(0, paramType), mod.local.get(1, paramType))],
75
binaryen.auto
76
)
77
);
78
map.set(name, { returnType });
79
};
By now src/compiler.mts should [look something like this](https://github.com/drew-y/wispy/blob/58d5872f8d927358c1f8f70ebb4dda6d9458a8c8/src/compiler.mts). With that, our compiler is finished. It's time to execute some wispy!

Putting It All Together

Now that we have finished our compiler, we can finally run our code.

First, replace the contents of src/index.mts with this:

1
// src/index.mts
2
3
#!/usr/bin/env node
4
import { readFileSync } from "fs";
5
import { compile } from "./compiler.mjs";
6
import { lex } from "./lexer.mjs";
7
import { parse } from "./parser.mjs";
8
9
const file = process.argv[2];
10
const input = readFileSync(file, "utf8");
11
12
const tokens = lex(input);
13
const ast = parse(tokens);
14
15
// !! New !!
16
const mod = compile(ast);
17
18
// This is sneakily where the code gen is *actually* happening
19
const binary = mod.emitBinary();
20
21
// Use the standard WebAssembly API to convert the wasm binary to a compiled module
22
// our host NodeJS/v8 can use
23
const compiled = new WebAssembly.Module(binary);
24
25
// Build the instance, here you would add any external functions you might want to import into
26
// the WebAssembly module
27
const instance = new WebAssembly.Instance(compiled, {});
28
29
// Finally, run the main function and log the result. We have to cast instance.exports to any
30
// The standard TypeScript types appear to be wrong.
31
console.log((instance.exports as any).main());

Now build and run project:

1
npx tsc
2
wispy example.wispy

If all goes well (and you passed the number 15 to fib), you should see the number 610 in the output of your console. If so, you've done it, you've made a working WebAssembly language. Congrats!

A full copy of the wispy language implementation is available under an open-source license at https://github.com/drew-y/wispy.

Want to learn more about Courier?

Request a demo to see how Courier works to make product notifications delightful.

More from Engineering

serverless-lie-detector-thumbnail
ENGINEERINGINTEGRATIONS

Create a Discord Bot that Automates Secret Messages with Node.js

Our plan is to create and install a Discord bot that automates encrypted messages to the civilians and alerts them about the situation so that they can escape.

Shreya Gupta

Shreya Gupta

September 27, 2022

serverless-lie-detector-thumbnail
ENGINEERINGINTEGRATIONS

Build a Serverless Lie Detector that uses AI for Facial Recognition

When the Face API recognizes that one of our spies is being deceitful, we will use Courier to broadcast the identity of the mole to our spy network.

Shreya Gupta

Shreya Gupta

September 12, 2022

Build your first notification in minutes

Send up to 10,000 notifications every month, for free.

Email & push notification

Build your first notification in minutes

Send up to 10,000 notifications every month, for free.

Email & push notification

Product

Pricing

Providers

Developers

Documentation

API

Libraries

Status

© 2022 Courier. All rights reserved.