Nand to Tetris - Virtual Machine II

vm2

In my first blog post on building a virtual machine, we went over arithmetic, logical, and mem segment commands. But this is a limited view of programming. For example, push, pop, add, and sub are all linear.

In this blog post we'll complete the implementation of the virtual machine. This includes branching, arithmetic, logical, memory access, and function commands.

Demos:


Arithmetic/Logical commands:

  • add
  • sub
  • neg
  • eq
  • gt
  • lt
  • and
  • or
  • not

Memory segment commands:

  • pop segment i
  • push segment i

Program Control

The flow of the program takes twists and turns.

x = -b + sqrt(power(b,2) - 4 * a * c)

We can make a method for power(b,2) - 4 * a * c to calculate the discriminant:

x = -b + sqrt(disc(a,b,c))

These are all abstractions. We can invent new subroutines. We can also branch based on conditions:

if !(a==0)
    x = (-b + sqrt(disc(a,b,c)))/(2*a);
else
    x = -c/b;

Branching commands:

  • goto label
  • if-goto label
  • label label

Function commands:

  • call function
  • function function
  • return

This high-level language compiles into VM code, gets translated into assembly, then translated into binary by an assembler.

Branching

Without branching, programs would be linear. With branching we can change the flow of control of the program. Forward, backward, looping, you name it and we can do it with branching.

There are two kinds of branching, unconditional and conditional.

Conditional (requires pushing condition to the stack before an if-goto command):

  • if-goto label
  • cond = pop;
  • if cond jump to execute the command after label.

Unconditional we goto/jump to a different position in the program. But we wouldn't need a condition.

Functions

We extend high-level programming languages using:

  • Methods
  • Subroutines
  • Functions
  • Procedures
  • etc.

All these can classify as functions. With a primitive function like add or sub we can create abstract functions for ourselves in VM language. If you want to call a function that expects n arguments, you push them onto the stack, and you call the function. After performing the logic on the arguments, the return value of the function should replace these n arguments on the stack. That is a function.

When we call a function during runtime, we have to:

  • Pass parameters of the calling function to the called function
  • Determine the return address with the caller's code
  • Save the caller's return address, stack, and memory segments
  • Jump to execute the called function

When the called function returns, we have to:

  • Return to the caller the value computed by the called function
  • Recycle the memory resources used by the called function
  • Bring back the state of the caller's stack and memory segments
  • Jump to the return address in the caller's code

Function, Call, and Return

Now I'll show some examples of a high-level program translating at compile time to VM language.

Example high level program:

class Main { function int main() { return Main.factorial(3); } function int factorial(int n) { if (n = 1) { return 1; } else { return Math.multiply(n, Main.factorial(n-1)); } } }

Example Pseudo VM code:

function main push 3 call factorial return function factorial(n) push n push 1 eq if-goto BASECASE push n push n push 1 // have n on stack and give (n-1) to factorial sub call factorial call mult // or could use OS's multiply return label BASECASE push 1 return function mult(a,b) // ...

Actual VM program:

function Main.main 0 push constant 3 call Main.factorial 1 return function Main.factorial 0 push argument 0 push constant 1 eq if-goto IF_TRUE0 goto IF_FALSE0 label IF_TRUE0 push constant 1 return goto IF_END0 label IF_FALSE0 push argument 0 push argument 0 push constant 1 sub call Main.factorial 1 call Math.multiply 2 return label IF_END0

Here is what the global stack would look like at runtime before reaching the base case of factorial(3).

global stack
ref arg0 3
return address saved main frame
f(3) 3
f(3) and ref arg0 2
return address saved f(3)
f(2) 2
f(2) and ref arg0 1
return address saved f(2)
f(1) 1

Since we do call factorial 1 we know that there is only one argument passed to the factorial function. So, we'd reference the top most value in the scoped stack to argument 0 for factorial(3).

After reaching the base case, the program would resume back at the call mult 2 line for each subroutine (except main()). Then, we'd copy the return value onto argument 0 for each scoped stack.

The final result would be:

3 * 2 * 1

The main function pushed 3 to the stack, called factorial, and got 6!

From the caller's point of view, it expects the following:

  • Before calling a function, we must push as many arguments as the function needs to the stack
  • We invoke the function using funcName nArgs
  • After the function returns, the argument values pushed onto the stack disappears. Then we replace with a return value (always exists) at the top of the stack
  • After the function returns, all memory segments should be in to the same state as they were before the call (but temp is undefined and static may have changed).

From the callee's perspective it expects the following:

  • argument segment has been initialized with arguments passed by the caller
  • local variable segment has been allocated and initialized as 0s
  • Static segment has been set to the static segment of the VM file (class) to which the callee belongs
  • Memory segments this, that, pointer, and temp are undefined upon entry.
  • Working stack is empty
  • Before returning, a value must be pushed onto the stack. Void functions still return a value like 0. It's the caller's responsibility to do something with this value. If the caller knows it called a function who's return value is void, it can simply toss away the return value.

VM Translator

The VM translator reads VM code and translates it into assembly. In our case, it's hack assembly. But I'll show pseudo assembly code so it's platform independent. Keep in mind that most VM language is written by compilers. Functions from classes are denoted as className.functionName.

So, something like:

function Foo.main 4 // ... // computes -(16 * (local 3)) push constant 16 push local 3 call Bar.mult 2 neg // ... function Bar.mult 2 // Returns product of two arguments // ... push local 1 return

Would be translated to something like:

(Foo.main) // setup of function exec // function exec // ... // asm for push constant 16 // asm for push locla 3 // asm to save caller's state // setup for function call goto Bar.mult // in asm (Foo$ret.1) // asm that handles neg // ... (Bar.mult) // setup of func exec // function exec // ... // asm for push local 1 // asm that moves return value to caller, // reinstates the caller's state, and then: goto Foo$ret.1 // in asm

Handling call

call functionName nArgs calls the functionName and says nArgs have been pushed to the stack. When we encounter this VM command, call functionName nArgs, there's a few things we need to do.

Pseudo assembly code (generated by the VM translator):

push returnAddress // using declared label below push LCL // saves LCL of caller push ARG // saves ARG of caller push THIS // saves THIS of caller push THAT // saves THAT of caller ARG = SP - 5 - nArgs // reposition ARG LCL = SP // reposition LCL goto functionName // program control shift to function (returnAddress) // declare label for return address

Handling function

function functionName nVars is the starting point for a function that has nVars local variables in VM language.

Pseudo assembly code (generated by the VM translator):

(functionName) // create label for function start // repeat nVars times: push 0 // allocate and init local segment with 0s

Handling return

return is the VM command that indicates to end the current function and go back to the return address with a return value.

Pseudo assembly code (generated by the VM translator):

endFrame = LCL // endFrame is a temp variable retAddr = *(endFrame - 5) // get return address // return value copied onto ARG[0] // reposition return value for caller // grabs return value from top of stack *ARG = pop() SP = ARG + 1 // reposition SP of caller back to normal // restore state of caller THAT = *(endFrame-1) THIS = *(endFrame-2) ARG = *(endFrame-3) LCL = *(endFrame-4) goto retAddr // cont program control at return address in caller

Summary

I showed the assembly code that would end up building the state of a global stack. This global stack gets built through turns in program control from functions. We maintain and build this global stack during program runtime.

We've gone over how to make a VM on any target platform. We test our VM translator using VM test programs, a CPU emulator, and test/compare scripts. We'll take a directory of Jack files, compile to VM language files, and translate into one assembly file. In our case the project will for the hack computer. But we could use these general guidelines to take VM code and generate assembly for other platforms. Platform independence is pretty sweet!

Resources

  • Nisan, Noam, and Shimon Schocken. The Elements of Computing Systems: Building a Modern Computer from First Principles. MIT, 2021.