Learning F# — A Simple Parser
- fsharp
A couple of weeks have passed since my last post and I still really enjoy learning F#. So when I found this interesting post called “Emulator basics: a stack and register machine” in which the author builds a virtual machine for running simple C programs, I decided to port the code from EcmaScript.
I won’t go into details regarding the actual VM design and its instruction set, the original post already explains it nicely. Let’s focus on the type of source code we’ll be parsing instead:
plus:
ADD RDI, RSI
MOV RAX, RDI
RET
main:
MOV RDI, 1
MOV RSI, 2
CALL plus
RET
This program will start at the label main:
, add 2 numbers into registers and then call the procedure defined after the label plus:
to add them. The result of the addition (3) will be the exit code of this program.
The Parser
I decided to represent the result of the parser as a record type called ParseResult
which consists of a list of tuples containing instructions and their optional arguments (e.g. ("add", ["RDI"; "RSI"])
or ("ret", [])
) as well as a map which tracks labels and their corresponding line numbers:
type ParseResult =
{ instructions: (string * string list) list
labels: Map<string, int> }
Patterns
F#’s active patterns are one of my favorite language features. Like most newcomers to the language I tend to overuse them, though in this specific case they seem like a good fit and a lightweight alternative to parser combinators or similar solutions.
let (|Empty|Directive|LineComment|Label|Instruction|) (line : string) =
if line = "" then Empty
else if line.StartsWith(".") then Directive
else if line.StartsWith(";") || line.StartsWith("#") then LineComment
else if line.Contains(":") then Label (line.Split(":").[0])
else Instruction (line.Split([|';'; '#'|]).[0])
Active patterns allow us to define named partitions of data, which can be used in pattern matching expressions just like constructors of discriminated unions. This specific pattern first checks if the line is empty, in which case it returns the Empty
identifier. Lines that start with a .
character are directives (Directive
) and will be ignored, just like full line comments (LineComment
) starting with ;
or #
. The Label
pattern matches any line containing a :
and will only return the part before the colon (e.g. main:
becomes main
). All other lines represent instructions and will return their content stripped of comments.
Parsing
The actual parsing logic is relatively simple:
let parse (source : string) : ParseResult =
source.Split("\n")
|> Array.fold (fun result (line : string) ->
match line.Trim() with
| Directive | Empty | LineComment -> result
| Label l ->
let labels = Map.add l result.instructions.Length result.labels
{ result with labels = labels }
| Instruction i ->
let instruction = parseInstruction i
{ result with instructions = (instruction :: result.instructions) }
) { instructions=[]; labels=Map.empty }
|> fun result -> { result with instructions = List.rev result.instructions }
The input source is split into lines and then Array.fold
is used for pattern matching every line and updating the ParseResult
record as needed. Directive
, Empty
and LineComment
have no effect on the program, so the result will be returned unmodified. Label
uses a copy and update record expression to add a new value to the map, while Instruction
does something similar for the list of tuples, using the following parseInstruction
helper function:
let private parseInstruction (instruction : string) =
let operation = instruction.Split(" ").[0].ToLower()
let operands =
instruction.Substring(operation.Length).Split(",")
|> Array.map (fun (s : string) -> s.Trim())
|> Array.filter ((<>) "")
|> Array.toList
operation, operands
This splits the operation and its operands and then cleans up the latter.
The Result
Calling this with the example program listed above, gives the following output in F# Interactive:
parse source;;
val it : ParseResult =
{instructions =
[("add", ["RDI"; "RSI"]); ("mov", ["RAX"; "RDI"]); ("ret", []);
("mov", ["RDI"; "1"]); ("mov", ["RSI"; "2"]); ("call", ["plus"]);
("ret", [])];
labels = map [("main", 3); ("plus", 0)];}
Summary
Like the original parser this is brittle. On the other hand it shows off a nice F# feature (active patterns) and the whole code weighs in at only about 30 SLOC. Given that I’m still relatively new to the language this is probably not the nicest piece of code, but it was fast to write and pretty much worked on first try once the types checked out. Stay tuned for the next installment in this series where I’ll tackle the interpreter part of the original post.