HW4 - Greatest Common Divisor

This is the second of two assignments in which you will write binary code for the simple machine you created a simulator for in lab. In this assignment, we will be using our modulo code from hw3 and extending our ISA to provide better support for function calls.

Adding a Stack

Our initial machine did not directly support a stack. While we could have managed a stack in our ISA using icodes 1, 2, and 3, we would like to provide the functionality at the hardware level. To do that, we decided to introduce a new register: the stack pointer rsp. This register will contain the memory address of the top of our stack.

Aside: In our machine’s version of the stack, we will only store 8-bit values. However, in practice we will want to store data of different sizes in our stack. We will see more of this as we move to x86-64.

The (Updated) Instructions

Remember that when our reserved bit (the 7th bit) is 0, we used the following icodes:

`icode` Behaviors

icode operation

rA = rB

rA += rB

rA &= rB

rA = read from memory at address rB

write rA to memory at address rB

do different things for different values of b:

`b`	action
0	`rA = ~rA`
1	`rA = -rA`
2	`rA = !rA`
3	`rA = pc`

do different things for different values of b:

`b`	action
0	`rA =` read from memory at `pc + 1`
1	`rA +=` read from memory at `pc + 1`
2	`rA &=` read from memory at `pc + 1`
3	`rA =` read from memory at the address stored at `pc + 1`

In all 4 cases, increase pc by 2, not 1, at the end of this instruction

Compare rA (as an 8-bit 2’s-complement number) to 0; - if rA <= 0, set pc = rB - otherwise, increment pc like normal.

One more icode

Now, we will add one more icode when the reserved bit is set to 1:

icode operation

do different things for different values of b:

`b`	action
0	Decrement `rsp` and push the contents of `rA` to the stack
1	Pop the top value from the stack into `rA` and increment `rsp`
2	Push `pc+2` onto the stack, set `pc = M[pc+1]`
3	`pc =` pop the top value from the stack

If b is not 2, update the pc as normal.

The first 2 actions (b = 0, 1) allow us to interact with the stack. When we push a value onto the stack, we decrement rsp to make space, then write the value of register A to memory at the address stored in rsp. When a value is popped off the stack, we read the value from memory at position rsp into register A and then increment rsp. Note, this does still leave the old value in memory, and that’s okay. Also note that our stack grows downward in memory.

The next two actions (b = 2, 3) provide additional support for function calls.

The first (b=2) is the function call. It adds 2 to the current pc and writes that value to the stack (also updating rsp). Consider: why might we write pc+2 to the stack instead of pc? It then performs an unconditional jump to the address given as the immediate value following the instruction (i.e., the next byte of our program). This is essentially calling the function.
The second (b=3) is the function return. It reads the top value from the stack (also updating rsp) and sets that value as the new pc. This is an unconditional jump to the address stored at the top of the stack. Use this at the end of the function to return back to the code that called it.

Aside: One thing to note here is that we are assuming that the top value of the stack, when executing instruction 83, is the address that called the current function. If the called function pushes any values onto the stack, it must pop them before returning, or else we may return to an unexpected position in memory.

We will see more of how the stack works, including the handling of return addresses, parameters, and return values, as we discuss x86-64 assembly in greater detail.

Function Call Syntax

We need more than just the address of our function code and the address to return when the function is complete. We also need parameters. Consider the following function call (in python-like syntax):

z = modulo(a, n)

We need to know:
- The location of our function modulo, i.e., the address in memory where that code starts
- The location of where a and n are stored
- A place to put the result, z

Our ISA will define the following calling conventions for functions. Calling conventions determine where the parameters and return values are located and also include any requirements on the registers that the function call is not using.

Instructions may have up to 2 arguments. They will be placed in registers 2 and 3; with the first parameter in register 2 and the second in register 3.
The return value for the function is stored in register 0.
Registers 0, 2, and 3 are considered volatile; that means that if the code calling the function wants to keep those values, it must save them to memory or push them onto the stack before calling the function (using instruction 82).
Register 1 is considered non-volatile; that means that if the function being called wants to use register 1, it must save the value to memory (i.e., push onto the stack) and restore it (i.e., pop from the stack) before returning to the caller (using instruction 83).

Updating our example above with these conventions, our function call would look something like the following (in python syntax):

r0 = modulo(r2, r3)

Or more specifically:

r2 = x
r3 = y
jump to (address of modulo() code)
z = r0

Therefore, when we want to make a function call, we will place the parameters in registers 2 and 3 before making the call with instruction 82. We cannot assume that the values in registers 2 and 3 will be the same when the function has completed and returns. When the function completes, it will jump back (using instruction 83) to the next instruction in our code to continue execution. Immediately after the function returns, we can safely assume that the function’s return value has been stored in register 0 (overwriting the value previously stored in that register before the function call).

What about r1? What does non-volatile mean? This means that if we were using r1 before we jumped to our modulo() code, then it should still be the same afterwards. So, for example, if we wanted to pre-load our destination address 0xD0:

r1 = 0xD0
r2 = x
r3 = y
jump to (address of modulo() code)
z = r0
// r1 should still have D0

It appears that the function’s code is not allowed to use register 1. But it can! It must simply stash the value away somewhere for safe-keeping, then bring it back out before the function returns. We also now have a place to store values like this: the stack. So our function can do something like:

push r1
// all the code to calculate modulo, r0 = r2 * r3,
// using r1 as needed
pop r1 // put r1's value back
return // use reserved=1, icode=0, b=3 to jump back

So remember: register 1 will have its same value after the function returns, but registers 0, 2, and 3 may have changed.

Your tasks

This homework is divided into two separate tasks:

Update the simulator from Lab 4 to provide the stack and new instructions
Use the new instructions to write a binary program that computes the greatest common divisor of the input values

Updating Our Simulators

We must first update our simulators from Lab 5 to support the stack and this new instruction.

Add the rsp to your code. In our simulators, we will initialize the stack pointer to a location late in memory (0xFF) and create a stack that grows downward in memory. There are trade offs for this choice (if our stack gets too big, it might overwrite our program!).
1. If you are using Java, update the following section of SimBase.java:
```
 // memory and registers
 public byte[] M;
 public byte[] R;
 public byte rsp = (byte) 0xFF; // new!
```
2. If you are using Python, update the following section of sim_base.py:
```
 # initialize memory and registers
 R = [0 for i in range(4)]
 M = [0 for i in range(256)]
 rsp = 0xFF # new!
```
Add the new instruction defined above (when the reserved bit is 1 and the icode is 0) to your execute() method. Note: it is up to the binary that our simulators execute to properly read the function parameters from memory and write return values to register 0. Your simulator should correctly update rsp and provide the jumps.
If the reserved bit is 1 and the icode is not 0, then set the next pc to the current pc instead of advancing it and do nothing else.

Greatest Common Divisor

Next, you will write a binary program gcd.binary using this new icode and associated operations.

Modify your modulo code from HW3 to be used as a function in a new program. You will include the updated version of this code at the end of your program below.
- Instead of needing to read the immediate values at bytes 0x01 and 0x03, it can assume the two values are already stored in registers 2 and 3 (respectively).
- You should save the contents of register 1, since it should be unchanged when your function returns. Just before returning, save the value back to register 1.
- Instead of storing the result at memory location D0, return the result to the caller. That is, store the result in register 0 and use the return instruction (83).
Write a new binary program that computes the greatest common divisor of the inputs (i.e., gcd((value at 0x01), (value at 0x03)) and stores the result at address 0xE0. Your new code should:
- Load the value in memory at address 0x01 into a register
- Load the value in memory at address 0x03 into a register
- Calculate the greatest common divisor, gcd(a, b) defined as:
  - gcd(a,b) is the largest positive integer that divides both a and b. See Calculating GCD below for more details; a more complete definition of the Euclidean algorithm for GCD is also available on Wikipedia
  - Note, you must call your original modulo code to compute a mod b. (hint: you will need the address where your modulo() code starts)
- Store the result (the GCD) at address 0xE0
- Halt once it is done.
Include your updated modulo code (from step 1 above) at the end of your new instructions. Note: you will need to know the starting address of your modulo code to call it, that is, to jump to it.

You may assume that the input values will not be negative, but they may be 0. We also define that greatest common divisor of x and 0 is x. You may also assume that the first value (in 0x01) is always larger than the second value (in 0x03) except in the case of 0.

Thus, if gcd.binary begins __ 10 __ 04 then when it is finished it should have 04 in address 0xE0; that is gcd(0x10,0x04) = gcd(16,4) = 4.

Note: We should be able to change the second and fourth bytes of your program to compute the greatest common divisor of other values as well. That is, consider the values in bytes 0x01 and 0x03 of your program to be immediate values. The autograder will overwrite literal values to these bytes to test your program.

Calculating GCD

The Euclidean algorithm for calculating greatest common divisor uses the modulo operation. Assuming a ≥ b, this algorithm makes use of the following observation:

gcd(a,b) = gcd(b, modulo(a,b))

The algorithm repeatedly applies the modulo operation until the result is 0. The greatest common divisor is smallest result seen before 0. For example:

gcd(66, 18) = gcd(18, modulo(66, 18)) = gcd(18, 12)
            = gcd(12, modulo(18, 12)) = gcd(12, 6)
            = gcd(6, modulo(12, 6))   = gcd(6, 0)
            = 6

Testing Your GCD Binary

To test your code, do one of

python3 sim_base.py gcd.binary

java SimBase gcd.binary

or going to our updated online simulator and click the file upload button at the top of the page to load your gcd.binary into the simulator’s memory.

Hints, tips, and suggestions

How to write binary

We suggest following these steps, carefully, saving the result of each in a file so you can go back and fix them if they were wrong:

Write pseudocode that does the desired task
Convert any for loops to while loops with explicit counters
Change any if or while guards to the form something <= 0
- a <= b becomes a-b <= 0
- a < b becomes a+1 <= b becomes a+1-b <= 0
- a >= b becomes 0 >= b-a becomes b-a <= 0
- a > b becomes 0 > b-a becomes b+1-a <= 0
- a == b becomes a-b == 0 becomes !(a-b) == 1 becomes !!(a-b) <= 0
- a != b becomes a-b != 0 becomes !(a-b) == 0 becomes !(a-b) <= 0
Add more variables to split multi-operation lines into a series of single-operation lines
Add more operations to convert ones not in the instruction set into ones in the instruction set
Change each loop into a pair of instructions, opening with “spot1 = pc” and closing with “if …, goto spot1”
Count the number of variables needed
- If¹ it is ≤ 4, skip to step 10
- else², continue with next step
Pick a memory address for each variable. Make these big enough your code is unlikely to get that big; for example, you might pick 0x80 though 0x80 + number of variables
Convert each statement that uses variables into
1. register ← load variable’s memory
2. original statement
3. store variable’s memory ← register
translate each instruction into numeric (icode, a, b) triples, possibly followed by a M[pc+1] immediate value
turn (icode, a, b) into hex
Write all the hex into gcd.binary

Debugging binary is hard. That’s part of why we don’t generally write code in binary. If you get stuck, you should probably try pulling just the part you are stuck on separate from the rest and test it until it works, then put it back in the main solution.

Submit

Submit both your gcd.binary and SimBase.java or sim_base.py files via Gradescope.

depending on how you write your original code, this is possible for this task … ↩
… some solutions are in this case instead. ↩

HW4 - Greatest Common Divisor

Adding a Stack

The (Updated) Instructions

icode Behaviors

One more icode

Function Call Syntax

Your tasks

Updating Our Simulators

Greatest Common Divisor

Calculating GCD

Testing Your GCD Binary

Hints, tips, and suggestions

How to write binary

Submit

`icode` Behaviors