Precedence: Chapter 1
One of our many fun side projects is an interpreter for a language we invented called INH (in-huh). Now, we would be the first to admit to you that we don’t really know how to write interpreters, and that we sort of jumped in head first without any design whatsoever. I don’t recommend this approach if you’re trying to build the next great programming language. However, it can be quite fun, and it’s a great way to learn.
I want to share one of the interesting problems we ran into as we built our interpreter. It came out of a bug we noticed as we were testing: our math was wrong! Any time we had a non-trivial mathematical expression, it usually produced the incorrect value. We did some digging and found out that our interpreter wasn’t respecting the order of operations.
So, we set out to fix the problem. Little did we know that it would open up into a world of discovery.
A Quick Recap of High School Math
Consider the expression: 1 + 2 * 3 + 7
A naive way to calculate this is left to right. 1 + 2 = 3. 3 * 3 = 9. 9 + 7 = 16. But the correct value is 14. That’s weird, right? Turns out, it’s not weird at all, it’s PEMDAS. Mathematical operations need to be processed in a certain order, regardless of how they are written.
If we follow our PEMDAS rules, our expression becomes 1 + (2 * 3) + 7, or 1 + 6 + 7, or 14, as any functioning calculator, or google search, (or astute reader) will tell us.
What does this have to do with interpreters?
I’m glad you asked! Obviously, if we want users of our language to be able to evaluate mathematical expressions, we need to account for these precedence rules. But first, before we can figure that out, we need to know a bit more about how a compiler or interpreter turns code into a program. How does it store the code in memory? How does it evaluate the code once it is all parsed and stored? Let’s take baby steps.
"Bad programmers worry about the code. Good programmers worry about data structures and their relationships."
-Linus Torvalds
Finding the right data structure is often the key to solving really hard programming problems. In our case, there is an awesome data structure that is used to store code that programmers write. It is known as a Syntax Tree.
Syntax Trees are great, and you’re in luck, because we’re going to learn about them right now!
Trees
We’ll start with the word ‘Tree’. Trees are a ubiquitous data structure in software engineering. If you’ve written code, you’ve probably run into them whether you realized it or not (some of them are quite hidden). Simply put, a tree is an object that stores some data, and also stores pointers to “children”, where each child is, itself, a tree. Here’s an example:
Notice how the 5 is the parent of the entire tree, but the 7 is also the parent of a little mini tree that contains 9, 2, and 6? Pretty cool, right?
You can see why they’re called trees, as they kinda look like Christmas trees when they’re drawn out this way. Also, if you flip this upside down, it resembles the branching behavior of real trees. In this tree, each node holds a number, but a tree could store any data. Names, objects, or even pieces of a program.
Syntax
The next word is syntax. Syntax is the set of rules that govern valid expressions. In English, syntax controls what words can be used to form valid sentences. For example, the sentence “Man, I really thought dog this afternoon” is syntactically incorrect because the noun “dog” can’t be there. You need more words! The same is true for programming languages, except programming languages usually have far stricter syntax requirements than written or spoken languages (however, they do tend to evolve based on how they’re used, much like written or spoken languages).
Here’s some examples of syntactically incorrect programming expressions:
A = 5 +
You can’t just have an operator that needs two arguments like plus, but not provide both arguments!
B = (4 + (3 / 2 + 1) / 2
You’ve got to close your parenthesis!
If you’re like me, these make you cringe a little.
So as you might imagine, elements of a syntax tree represent elements of syntax. It’s quite literally a tree of syntax chunks. For example:
1 + 5, in treespeak, translates to:
A = B + 2, in treespeak, is:
In a compiled program, a syntax tree is converted into object code, but in an interpreter like the one we’ve written, we can gleefully skip that step and directly evaluate the syntax tree itself. In the next chapter, we’ll see how that works!