# CS0 2130

Overview

#### Daniel G. Graham PhD





- 1. Discuss the ideas from our last 14 lectures.
- 2. Only 25 lectures left after this one. (Yeah, it goes by fast)
- 3. Topics, and tools for continuing your hardware journey
- 4. Now we move to software x86 assembly and C



https://github.com/MKrekker/SINGLE-CYCLE-RISC-V







#### THIS WERE WE'LL START OUR JOURNEY







5

#### WHAT ARE LOGIC GATES

- Logic gates are circuits that perform logic functions
  - such as AND, OR, (NOT), etc
- Logic gates have different symbols and their behavior is normally described using a truth table.



.

#### SINGLE INPUT VS TWO INPUT GATES

NOT



 $Y = \overline{A}$ 

Α

1





Y = A + B





#### **MORE LOGIC GATES**

| XOR              |   | NAND                                                   |   |        | NOR                         |   | XNOR |  |   |   |   |   |   |   |   |
|------------------|---|--------------------------------------------------------|---|--------|-----------------------------|---|------|--|---|---|---|---|---|---|---|
|                  |   | A –                                                    |   | A<br>B | $()  ()  \mathbf{V}$        |   |      |  |   |   |   |   |   |   |   |
| $Y = A \oplus B$ |   | $Y = \overline{AB} \qquad \qquad Y = \overline{A + B}$ |   | 3      | $Y = \overline{A \oplus B}$ |   |      |  |   |   |   |   |   |   |   |
|                  | Α | В                                                      | Y |        | Α                           | В | Y    |  | A | В | Y |   | A | В | Y |
| _                | 0 | 0                                                      |   |        | 0                           | 0 |      |  | 0 | 0 |   | - | 0 | 0 |   |
|                  | 0 | 1                                                      |   |        | 0                           | 1 |      |  | 0 | 1 |   |   | 0 | 1 |   |
|                  | 1 | 0                                                      |   |        | 1                           | 0 |      |  | 1 | 0 |   |   | 1 | 0 |   |
|                  | 1 | 1                                                      |   |        | 1                           | 1 |      |  | 1 | 1 |   |   | 1 | 1 |   |



#### PULL UP PULL DOWN NETWORKS





#### NOT GATE





| A | <b>P1</b> | N1 | Y |
|---|-----------|----|---|
| 0 |           |    |   |
| 1 |           |    |   |





NAND gates are Turning complete you can build all other gates from them



# NAND



# $Y = \overline{AB}$













Levels D

Level Help

#### Nand

Your task is to connect inputs to output through wires and relays such that when both **a** and **b** inputs are 1, the output is 0.

Solve Level

1 represents electrical current, **0** represents no current.

The **V** input carries constant current, i.e. always 1.

The exact specification:





Input:

俞

OK

Nand

#### Welcome to The Nand Game!

You are going to build a computer starting from basic components.

 $\times$ 

The game consists of a series of levels. In each level, you are tasked with building a component that behaves according to a specification. This component can then be used as a building block in the next level.

The game does not require any previous knowledge about computer architecture or software, and does not require math skills beyond addition and subtraction. (It does require some patience—some of the tasks might take a while to solve!)

Your first task is to build a nand component.

On the left of the diagram is the exact specification of the task. Click "Level Help" for further information which might be helpful. THE NAND GAME



| A | B | С | Q |
|---|---|---|---|
| 0 | 0 | 0 | 0 |
| 0 | 0 | 1 | 0 |
| 0 | 1 | 0 | 0 |
| 0 | 1 | 1 | 1 |
| 1 | 0 | 0 | 0 |
| 1 | 0 | 1 | 0 |
| 1 | 1 | 0 | 1 |
| 1 | 1 | 1 | 1 |

### WHAT IS THE OUTPUT OF THIS CIRCUIT?







https://github.com/MKrekker/SINGLE-CYCLE-RISC-V



#### 1 BIT MUX









#### 2 BIT MUX









#### 2 BIT MUX





#### THE IDEA





#### THE CHALLENGE

Our gates only support 0 and 1s.

How can we represent other decimal numbers?

How can we present negative numbers?

What about fractions  $\odot$ ?

#### **BINARY**

$$1101_{2} = 1 \times 2^{3} + 1 \times 2^{2} + 0 \times 2^{1} + 1 \times 2^{0} = 13_{10}$$



#### **4-BIT ADDER**





| Signed | Bits |
|--------|------|
| 0      | 0000 |
| 1      | 0001 |
| 2      | 0010 |
| 3      | 0011 |
| 4      | 0100 |
| 5      | 0101 |
| 6      | 0110 |
| 7      | 0111 |
| -0     | 1000 |
| -1     | 1001 |
| -2     | 1010 |
| -3     | 1011 |
| -4     | 1100 |
| -5     | 1101 |
| -6     | 1110 |
| -7     | 1111 |

| Unsigned |
|----------|
| 0        |
| 1        |
| 2        |
| 3        |
| 4        |
| 5        |
| 6        |
| 7        |
| 8        |
| 9        |
| 10       |
| 11       |
| 12       |
| 13       |
| 14       |
| 15       |

### Sign Bit



| Signed | Bits |  |
|--------|------|--|
| -7     | 0000 |  |
| -6     | 0001 |  |
| -5     | 0010 |  |
| -4     | 0011 |  |
| -3     | 0100 |  |
| -2     | 0101 |  |
| -1     | 0110 |  |
| 0      | 0111 |  |
| 1      | 1000 |  |
| 2      | 1001 |  |
| 3      | 1010 |  |
| 4      | 1011 |  |
| 5      | 1100 |  |
| 6      | 1101 |  |
| 7      | 1110 |  |
| 8      | 1111 |  |

| Unsigned |
|----------|
| 0        |
| 1        |
| 2        |
| 3        |
| 4        |
| 5        |
| 6        |
| 7        |
| 8        |
| 9        |
| 10       |
| 11       |
| 12       |
| 13       |
| 14       |
| 15       |

### Bias

$$Floor((2^{n} - 1)/2) = 7$$



| Signed | Bits |  |
|--------|------|--|
| 0      | 0000 |  |
| 1      | 0001 |  |
| 2      | 0010 |  |
| 3      | 0011 |  |
| 4      | 0100 |  |
| 5      | 0101 |  |
| 6      | 0110 |  |
| 7      | 0111 |  |
| -8     | 1000 |  |
| -7     | 1001 |  |
| -6     | 1010 |  |
| -5     | 1011 |  |
| -4     | 1100 |  |
| -3     | 1101 |  |
| -2     | 1110 |  |
| -1     | 1111 |  |

| Unsi | gned |
|------|------|
| 0    | )    |
| 1    |      |
| 2    |      |
| 3    |      |
| 4    | :    |
| 5    |      |
| 6    |      |
| 7    |      |
| 8    | •    |
| 9    |      |
| 1    | 0    |
| 13   | 1    |
| 1:   | 2    |
| 13   | 3    |
| 14   | 4    |
| 1!   | 5    |

## Two's Complement



| Hex Digit | Decimal | Binary |
|-----------|---------|--------|
| 0         | 0       | 0000   |
| 1         | 1       | 0001   |
| 2         | 2       | 0010   |
| 3         | 3       | 0011   |
| 4         | 4       | 0100   |
| 5         | 5       | 0101   |
| 6         | 6       | 0110   |
| 7         | 7       | 0111   |
| 8         | 8       | 1000   |
| 9         | 9       | 1001   |
| А         | 10      | 1010   |
| В         | 11      | 1011   |
| С         | 12      | 1100   |
| D         | 13      | 1101   |
| E         | 14      | 1110   |
| F         | 15      | 1111   |

#### HEXADECIMAL

Convert 00101110 to hexadecimal Answer: 2E

Group them 0010 = 2 1110 = E Final 0x2E

- Some programming languages uses prefixes
  - Hex: Ox
    - 0x23AB = 23AB<sub>16</sub>
  - Binary: Ob
    - 0b1101 = 1101<sub>2</sub>



#### BITWISE OR |

### 1100<sub>2</sub> 0110<sub>2</sub> 1110<sub>2</sub>

#python example
x = 12
y = 6
z = x | y
print(z)
#prints 14



#### BITWISE OR XOR ^

## 1100<sub>2</sub> ^ 0110<sub>2</sub> 1010<sub>2</sub>

#python example
x = 12
y = 6
z = x ^ y
print(z)
#prints 10



#### **FLIPPING BITS**

File the second bit of x.  $1 \Rightarrow 0$  and  $0 \Rightarrow 1$ 

1100<sub>2</sub> ∧ 0010<sub>2</sub>

1110<sub>2</sub>

What if the nth bit was 1 instead?



### MASKING (EXTRACTING BITS)

The Idea of masking with can extra a certain section of bits by anding.

11011100<sub>2</sub> & 00001111<sub>2</sub>





Lower 4 bits extracted





Same as just xoring each bit  $0 \oplus 0 \oplus 1 \oplus 0 = 0$ 



#### PARALLEL EVALUATION

Observe that xor is both transitive and associative; thus we can re-write

 $x0 \oplus x1 \oplus x2 \oplus x3 \oplus x4 \oplus x5 \oplus x6 \oplus x7$ 

using transitivity as x0⊕x4⊕x1⊕x5⊕x2⊕x6⊕x3⊕x7

and using associativity as  $(x0 \oplus x4) \oplus (x1 \oplus x5) \oplus (x2 \oplus x6) \oplus (x3 \oplus x7)$ 

and then compute the contents of all the parentheses at once via  $x \wedge (x >>4)$ .

#### PARALLEL EVALUATION

#### x0 $\oplus$ x1 $\oplus$ x2 $\oplus$ x3 $\oplus$ x4 $\oplus$ x5 $\oplus$ x6 $\oplus$ x7 using transitivity as

 $x0 \oplus x4 \oplus x1 \oplus x5 \oplus x2 \oplus x6 \oplus x3 \oplus x7$ 

and using associativity as  $(x0 \oplus x4) \oplus (x1 \oplus x5) \oplus (x2 \oplus x6) \oplus (x3 \oplus x7)$ 

and then compute all at once via x ^ (x>>4).

x ^= (x>>16) x ^= (x>>8) x ^= (x>>4) x ^= (x>>2) x ^= (x>>1) parity = (x & 1)

### **ENDIANNESS**

|          |    |    | +  |    |    |    |    |    |
|----------|----|----|----|----|----|----|----|----|
| 00000010 | 00 | 00 | 5C | 00 | 00 | 00 | 90 | 00 |
| 00000020 | 00 | 00 | 40 | 9B | 00 | 00 | 23 | 2E |
| 00000030 | 00 | 00 | 00 | 00 | 00 | 00 | 42 | 47 |
| 00000040 | 00 | 00 | 00 | 00 | 00 | 00 | 00 | 00 |
| 00000050 | 00 | 00 | 00 | 00 | 00 | 00 | 00 | 00 |
| 00000060 | 00 | 00 | 00 | 00 | 00 | 00 | 00 | 00 |
| 00000070 | 00 | 00 | 00 | 00 | 00 | 00 | 00 | 00 |
| 00000080 | EB | DF | D5 | E2 | D6 | СС | DE | D2 |

Little ENDIAN 0x0000005C (92<sub>10</sub>)

# Less significant at Lowest address

### **ENDIANNESS**

| 00000010 | 00 | 00 | 5C | 00 | 00 | 00 | 90 | 00 |
|----------|----|----|----|----|----|----|----|----|
| 00000020 | 00 | 00 | 40 | 9B | 00 | 00 | 23 | 2E |
| 00000030 | 00 | 00 | 00 | 00 | 00 | 00 | 42 | 47 |
| 00000040 | 00 | 00 | 00 | 00 | 00 | 00 | 00 | 00 |
| 00000050 | 00 | 00 | 00 | 00 | 00 | 00 | 00 | 00 |
| 00000060 | 00 | 00 | 00 | 00 | 00 | 00 | 00 | 00 |
| 00000070 | 00 | 00 | 00 | 00 | 00 | 00 | 00 | 00 |
| 00000080 | EB | DF | D5 | E2 | D6 | СС | DE | D2 |

Big ENDIAN 0x5C000000 (1543503872<sub>10</sub>)

Most significant Byte at lowest address

#### **ENDIANNESS**





## **FLOATING POINT**

- How can we represent decimal values in binary?
- Why do errors like these occur?

>>> 0.1 + 0.1 + 0.1 == 0.3
False
>>> (0.1 + 0.1 + 0.1) == 0.3
False
>>> x = 0.1 + 0.1 + 0.1
>>> x
0.300000000000004

Floating point rounding error

>>> 0.3 + 0.3 + 0.3 >>> 0.89999999999999999







# number = sign(1 + Mantissa) x 2<sup>exponent - bias</sup>

On 32 bit machines bias in normal 127 (Yes this is bias representation we talked about earlier)



#### **BINARY STRING**

# $0.1101 = 1.101 \times 2^{-1}$

Keep going until you get to your first 1.

# $0.01101 = 1.101 \times 2^{-2}$

# $0.001101 = 1.101 \times 2^{-3}$



## **CONVERSION EXAMPLE**

Let's convert 0.8125 to floating-point representation

0.8125 x 2=1.6250 1 0.6250 x 2=1.2500 1 0.2500 x 2=0.5000 0 0.5000 x 2=1.0000 1



 $1 \times 1/2 + 1 \times 1/4 + 0 \times 1/8 + 1 \times 1/16 = 13/16$ 

0.1101 = 1.101 x 2<sup>-1</sup>



### **CONVERSION EXAMPLE PART 2**

Sign: 0 Mantissa: 101 Exponent: -1 + 127 = 126(d) = 1111110(b)

0 01111110 1010000 0000000 00000000



# **CONVERSION PART 3**





# GREAT NOW WE HAVE ALL WE NEED TO THINK ABOUT DESIGNING OUR ADDER



### **4-BIT ADDER**



Great now let's build it with gates.



# ADDING

Let start by building a half adder something that just adds two bits.

Let's build a truth table.

| А | В | A + B | C.out |
|---|---|-------|-------|
| 0 | 0 | 0     | 0     |
| 0 | 1 | 1     | 0     |
| 1 | 0 | 1     | 0     |
| 1 | 1 | 0     | 1     |

We can implement A + B with an XOR gate And the C.out (Carry out) With an AND gate





1111

0111

+1011

0010

← Carries



# HALF ADDER DEMO



#### https://tinyurl.com/ygpea8v4

http://www.falstad.com/circuit/circuitjs.html?ctz= CQAgjCAMB0l3BWc0FwCwCY0HYEA4cEMElURTJy BTAWjDACgwE0QMs21KBmANj06VKGKOSZl2rMGl Z8B01sNEIGAGXAZ5vSnkphtbUQDMAhgBsAzlXJQ 1GgZJC62HEZVOXrSSAwDu9lykDRx9fWEOcIDQ8AMwTUDov1il1kcQ5PitPQBOESiYsDyU 8GLiXlswsoQK9JrK0vzgyIMfAFkQOXAZEDR9brS2F AYOrqxKPtquQwxhoA

# ADDING

Half Adder

| C <sub>out</sub> |   | B<br> <br>+<br> <br>S |        |
|------------------|---|-----------------------|--------|
| A                | В | C <sub>out</sub>      | S      |
| 0<br>0           | 0 | 0                     | 0<br>1 |
| 0                | 1 | 0                     | 1      |
| 1                | 0 | 0                     | 1<br>0 |
| 1                | 1 | 1                     | 0      |
|                  |   | = A ⊕<br>= AB         | В      |



We can implement A + B with an XOR gate And the C.out (Carry out) With an AND gate



| Half<br>Adder                |                                                                      |                  | Full<br>Adder                                                    |   |                 |   |   |                  |   |
|------------------------------|----------------------------------------------------------------------|------------------|------------------------------------------------------------------|---|-----------------|---|---|------------------|---|
| $C_{out} \xrightarrow{A  B}$ |                                                                      |                  | $C_{out} \xrightarrow{A} B$ $C_{out} \xrightarrow{I} C_{in}$ $S$ |   |                 |   |   |                  |   |
| А                            | В                                                                    | C <sub>out</sub> | S                                                                |   | $C_{\text{in}}$ | А | В | $C_{\text{out}}$ | S |
| 0                            | 0                                                                    | 0                | 0                                                                | - | 0               | 0 | 0 | 0                | 0 |
| 0                            | 1                                                                    | 0                | 1                                                                |   | 0               | 0 | 1 | 0                | 1 |
| 1                            | 0                                                                    | 0                | 1                                                                |   | 0               | 1 | 0 | 0                | 1 |
| 1                            | 1                                                                    | 1                | 0                                                                |   | 0               | 1 | 1 | 1                | 0 |
|                              |                                                                      |                  |                                                                  |   | 1               | 0 | 0 | 0                | 1 |
| S                            | 5                                                                    | = A ⊕ E          | 3                                                                |   | 1               | 0 | 1 | 1                | 0 |
| C                            | Cout :                                                               | = AB             |                                                                  |   | 1               | 1 | 0 | 1                | 0 |
|                              | 041                                                                  |                  |                                                                  |   | 1               | 1 | 1 | 1                | 1 |
|                              | $S = A \oplus B \oplus C_{in}$<br>$C_{out} = AB + AC_{in} + BC_{in}$ |                  |                                                                  |   |                 |   |   |                  |   |



Note on special case 3 input xor. Draw the three gates. Really Three xors stacked.

> UNIVERSITY VIRGINIA

ENGINEERING

Full Adder



 $C_{out} = AB + AC_{in} + BC_{in}$ 

C.out has been rewritten to reduce the number of gates needed.





← Carries

1111

+101

111

0111

## **RIPPLE CARRY ADDER**

Next let's build a full adder







# **RIPPLE CARRY ADDER**

| 1111          | $\leftarrow$ Carries |
|---------------|----------------------|
| 0111          |                      |
| + <u>1011</u> |                      |
| 0010          |                      |





https://github.com/MKrekker/SINGLE-CYCLE-RISC-V



#### **CLOCKS EDGES**

**Rising Edge** 





#### **CLOCKS EDGES**



We will build a single cycle machine it will complete all the computation in a single cycle

> UNIVERSITY VIRGINIA

ENGINEERING

# USING RING OSCILLATORS TO GENERATE CLOCKS

A clock is something that produces a periodic signal



Let's walk through an example assume that Q starts of as 0

Frequency = 1/(2\*t\*n)

Where t is time delay of an inverter and n is number of inverters



# **STORING SINGLE**

#### Goal

- 1. Understand the behavior of a positive edgetriggered D flip-flop.
  - How do we store a bit
  - What happens when the clock changes
  - What does it mean to be a positive edge triggered flip flop
  - What is Q and Q

|         | D                | Q |  |
|---------|------------------|---|--|
| Clock—— | $\triangleright$ | Q |  |

# **BUILT SIMULATOR VERSION**



Use this link to experiment with the flipflop during lecture. Try different things and see how it works

#### https://tinyurl.com/2dhk5kvg

http://www.falstad.com/circuit/circuitjs.html?ctz= CQAgjCAMB0l3BWcMBMcUHYMGZIA4UA2ATmlx AUgoqoQFMBaMMAKDASUPxABZsUQGPD178oFF gHcQXPKIE88VPgMhSZ3HoRGLl2qCwAyvJfN6KzVC ADMAhgBsAznWpqASib064vfRAFgPijQSMFIVDAIL ACygpA6YkrKYIRhLAD21PrKkKSu0BBWIADyAK4AL gAOFRngMiI5eeGw8GSECIQo4SABINggAJYAdtXlt QLZvLnE+fC5GO2d3QIC-QDG9ulrANYsQA

# THE FLIP FLOP HOLD HOLDS THE VALUE FOR A CLOCK CYCLE





## **BUILDING A REGISTER FROM FLIP FLOPS**



#### Removed Q (bar) for reability

#### **REGISTER SYMBOLS**





# **3-BIT COUNTER**

Let's put it all together and build a 3-bit counter

Circuit that counts from

000,

001,

010,

011,

100,

101,

110,

111







## **PROGRAM COUNTER**





#### **MEMORY COMPONENTS OF A PROCESSOR**





# **REGISTER FILE**

- Temporary storage location
- Stores immediately needed variables
- External interface
  - Addresses: A1, A2, A3
  - Data: RD1, RD2, WD3
  - Enable: WE3
  - Clock: CLK



### **READ FROM A REGISTER FILE**





# **DEMULTIPLEXER (DEMUX)**

#### Example: 1:2 Demux



- Connects one input to one of the **N** outputs
- Select input is log<sub>2</sub>N bits control input





### WRITE TO A REGISTER FILE





## **INSTRUCTION MEMORY**

- Stores the program
- Read data (RD) for a given address (A)



For this class, we will assume we cannot write to Instruction Memory.



# DATA MEMORY

- Contains data needed by the program
- Read data (RD) from a given address (A)
- Write data (WD) to a given address (A)



### **EXERCISE**

Answer:

| 00000000 | 50 | 01 | 02 | 03 | 04 | 05 | <b>0</b> 8 | 0D  | 15         | 22 | 37 | 46 | FF | AA | C2 | 34 |
|----------|----|----|----|----|----|----|------------|-----|------------|----|----|----|----|----|----|----|
| 000000D0 | 3D | 18 | 55 | 6D | C2 | 2F | F1         | 20  | 11         | 31 | 42 | 73 | B5 | 28 | DD | 05 |
| 000000E0 | E2 | 27 | С9 | B0 | 79 | 29 | A2         | СВ  | 6D         | 38 | A5 | DD | 82 | 5F | E1 | 40 |
| 000000F0 | 21 | 72 | 83 | E3 | 12 | 34 | 56         | -78 | <b>A</b> 3 | 87 | 39 | D0 | 09 | DF | E4 | B5 |

# **MEMORY HIERARCHY**



Levels in Speed and Capacity of the Memory

Higher levels larger and slower memory

#### Memory Hierarchy Design

Figure from: https://www.geeksforgeeks.org/memory-hierarchy-design-and-its-characteristics/







# **ALU SYMBOL AND INPUTS**





# TOY ISA AND PROCESS VERSION 0.1 WE'LL MAKE IT BETTER DON'T WORRY



# TINY PROGRAM TO ASSEMBLY

m = 4 x = 2 b = -1 $y = m^*x^*b$  Looks like we need two types on instructions

- 1. An instruction to load values
- 2. An instruction to computation (multiply)

# LET'S DECIDE HOW WE ARE GOING TO LAYOUT OUR BITS

1. An instruction to load values into **<u>Registers</u>** 







# NOW LET'S TRANSLATE OUT PROGRAM TO ONES AND ZERO





|   | icode | b | meaning                                                                       |                       |
|---|-------|---|-------------------------------------------------------------------------------|-----------------------|
| - | 0     |   | rA = rB                                                                       | FULL ISA              |
|   | 1     |   | rA + = rB                                                                     |                       |
|   | 2     |   | rA &= rB                                                                      |                       |
|   | 3     |   | <b>rA</b> = read from memory at address <b>rB</b>                             | Let look at each      |
|   | 4     |   | write ${f r}{f A}$ to memory at address ${f r}{f B}$                          | <pre>_ of these</pre> |
| - | 5     | 0 | $rA = \sim rA$                                                                | instructions          |
|   |       | 1 | $\mathbf{r}\mathbf{A} = -\mathbf{r}\mathbf{A}$                                | instructions          |
|   |       | 2 | rA = !rA                                                                      |                       |
|   |       | 3 | rA = pc                                                                       |                       |
| _ | 6     | 0 | <b>rA</b> = read from memory at <b>pc</b> + 1                                 |                       |
|   |       | 1 | <b>rA</b> += read from memory at <b>pc</b> + <b>1</b>                         |                       |
|   |       | 2 | rA &= read from memory at pc + 1                                              |                       |
|   |       | 3 | $\mathbf{rA}$ = read from memory at the address stored at $\mathbf{pc}$ + $1$ |                       |
| _ |       |   | For icode 6, increase <b>pc</b> by 2 at end of instruction                    |                       |
| - | 7     |   | Compare ${f r}{f A}$ as 8-bit 2's-complement to ${f 0}$                       |                       |
|   |       |   | if <b>rA &lt;= 0</b> set <b>pc = rB</b>                                       |                       |
|   |       |   | else increment <b>pc</b> as normal                                            |                       |
|   |       |   |                                                                               |                       |

# **GREAT WE HAVE OUR FIRST INSTRUCTION**

| XXX | RA | Value |
|-----|----|-------|
|-----|----|-------|

RA = Value



# AUTOMATICALLY FETCH A NEW INSTRUCTION



n-bit PC







Our program would have loaded values into the register file

R0 =3 R1 = 2 R2 = -1



### **OPCODE**





Finally, we need an opcode to distinguish our load instruction from our multiple

0 --> Multiply 1 --> Save Value to register



# **ENCODING**





# **BUILDING MACHINE TO COMPUTE THIS**





# NOTE WE ALSO NEED TO UPDATE THE ENCODING OF OUR LOADS







# INSTEAD GOING INSTRUCTION BY INSTRUCTION LET'S DESIGN THE ISA AND THE MACHINE



# **TOY INSTRUCTION SET ARCHITECTURE (ISA)**

The ISA defines:

- 1. Instructions and their layout
- 2. Data types
- 3. Registers we'll have





How instructions are laid out in our ISA



# **ENCODING OUR FIRST INSTRUCTION**

Try to encode the following instruction R0 = R1





| icode | b | Behavior                                                      |
|-------|---|---------------------------------------------------------------|
| 0     |   | rA=rB                                                         |
| 1     |   | rA+=rB                                                        |
| 2     |   | rA&=rB                                                        |
| 6     | 0 | rA=read from memory at pc + 1<br>Also written as rA = M[pc+1] |



Notice that we have to increment the Program Counter by **two** for these instructions. Because they are two bytes long while the other instructions are only 1 byte





## THE FLOW





#### **Toy ISA Simulator**

Choose File no file selected









| icode | b | meaning                                                                     |                     |
|-------|---|-----------------------------------------------------------------------------|---------------------|
| <br>0 |   | rA = rB                                                                     | FULL ISA            |
| 1     |   | rA + = rB                                                                   |                     |
| 2     |   | rA &= rB                                                                    |                     |
| 3     |   | <b>rA</b> = read from memory at address <b>rB</b>                           | We'll give the      |
| 4     |   | write ${f r}{f A}$ to memory at address ${f r}{f B}$                        | — full description  |
| <br>5 | 0 | $rA = \sim rA$                                                              | of ISA at the       |
|       | 1 | rA = -rA                                                                    |                     |
|       | 2 | rA = !rA                                                                    | begin of every      |
|       | 3 | rA = pc                                                                     | exam. In fact this  |
| 6     | 0 | rA = read from memory at pc + 1                                             | — a picture of what |
|       | 1 | $\mathbf{rA}$ += read from memory at $\mathbf{pc}$ + 1                      | we will give you.   |
|       | 2 | rA &= read from memory at pc + 1                                            |                     |
|       | 3 | $\mathbf{rA}$ = read from memory at the address stored at $\mathbf{pc}$ + 1 |                     |
|       |   | For icode 6, increase <b>pc</b> by 2 at end of instruction                  |                     |
| 7     |   | Compare ${f r}{f A}$ as 8-bit 2's-complement to 0                           |                     |
|       |   | if <b>rA &lt;= 0</b> set <b>pc = rB</b>                                     |                     |
|       |   | else increment <b>pc</b> as normal                                          |                     |

# **READ FROM MEMORY ADDRESS STORED IN RB**

Registers

| Registers |    | 0  | 1  | 2  | 3  | 4 | 5 | 6 | 7 | 8 | 9 | Α | В | С | D | E | F |
|-----------|----|----|----|----|----|---|---|---|---|---|---|---|---|---|---|---|---|
| R0 X      | 00 | 64 | 23 | 31 |    |   |   |   |   |   |   |   |   |   |   |   |   |
| R1 X      | 10 |    |    |    |    |   |   |   |   |   |   |   |   |   |   |   |   |
| R2 X      | 20 |    |    |    | FF |   |   |   |   |   |   |   |   |   |   |   |   |
| R3 X      | 30 |    |    |    |    |   |   |   |   |   |   |   |   |   |   |   |   |



What are the values of R0 and R1. Once program completes?



# **REGISTER SPILLING**

Because we have a limited number of registers, we can't store all variables in registers, so we must store some in memory and read them into a register when we need them. Here is the strategy

- 1. Read the register value to a predetermined location in memory.
- 2. Use the register
- 3. Write the register value back to memory, so that it can be used to store something else

| Architecture | 8 bit | 32 bit | 64 bit |
|--------------|-------|--------|--------|
| ARM          | Х     | 15     | 31     |
| Intel x86    | Х     | 8      | 16     |
| Toy ISA      | 4     | X      | X      |



# LET'S CALCULATE WHERE TO JUMP TO

Memory Address

Size of Instruction

| 0x00 | R0 = M[0x20]                 | 2 Bytes |
|------|------------------------------|---------|
| 0x02 | $R1 = \frac{0 \times 07}{0}$ | 2 Bytes |
| 0x04 | If RO <= 0 set PC= R1        | 1 Byte  |
| 0x05 | R0 += 1                      | 2 Bytes |
| 0x07 | R0 &= 2                      | 2 Bytes |

So what address do we want R1 to be?



# WRITE A LOOP

First, rewrite as a do-while loop. (This due to limitation in Toy ISA) reasons will be clear later.



## WRITE A LOOP



But wait is that correct? Translating the condition can be tricky



## WRITE A LOOP



-3, -2, -1, 0, 1 (five times)



# SOME PERPECTIVE (RISC-V)

#### The RISC-V Instruction Set Manual Volume I: User-Level ISA Document Version 2.2

Editors: Andrew Waterman<sup>1</sup>, Krste Asanović<sup>1,2</sup> <sup>1</sup>SiFive Inc., <sup>2</sup>CS Division, EECS Department, University of California, Berkeley andrew@sifive.com, krste@berkeley.edu May 7, 2017

Available at: <u>https://riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf</u>



## **RISC VS CISC**

RISC-V ADD https://msyksphinz-self.github.io/riscvisadoc/html/rvi.html#addi X86 Add https://www.felixcloutier.com/x86/add



# NOW THAT WE HAVE OUR ISA LET'S DESIGN THE MACHINE



# INSTRUCTION MEMORY AND INSTRUCTION REGISTER

R icode RA RB

immediate



Instruction register (IR)

Our diagram is going to have several comments so I will not draw the IR

Note: input and output widths on the Instruction memory. The memory is byteaddressable but reads 2 bytes at a time

# **1 BYTE AND 2 BYTE INSTRUCTIONS**



We'll add a mux that will select passing one to adder or two.

The mux will be controlled with a control line  $C_0$ . But what component provides the control signal? Answer the Controller

## HARDWIRED CONTROL UNIT

| icode | b | C <sub>0</sub> | •••• | C <sub>n</sub> |
|-------|---|----------------|------|----------------|
| 6     | х | 1              |      |                |
|       |   |                |      |                |











|   | icode | b | meaning                                                                     |  |  |
|---|-------|---|-----------------------------------------------------------------------------|--|--|
| - | 0     |   | rA = rB                                                                     |  |  |
|   | 1     |   | rA + = rB                                                                   |  |  |
|   | 2     |   | rA &= rB                                                                    |  |  |
|   | 3     |   | <b>rA</b> = read from memory at address <b>rB</b>                           |  |  |
|   | 4     |   | write ${f r}{f A}$ to memory at address ${f r}{f B}$                        |  |  |
|   | 5     | 0 | $rA = \sim rA$                                                              |  |  |
|   |       | 1 | $\mathbf{r}\mathbf{A} = -\mathbf{r}\mathbf{A}$                              |  |  |
|   |       | 2 | rA = !rA                                                                    |  |  |
|   |       | 3 | rA = pc                                                                     |  |  |
| - | 6     | 0 | rA = read from memory at pc + 1                                             |  |  |
|   |       | 1 | $\mathbf{rA}$ += read from memory at $\mathbf{pc}$ + 1                      |  |  |
|   |       | 2 | rA &= read from memory at pc + 1                                            |  |  |
|   |       | 3 | $\mathbf{rA}$ = read from memory at the address stored at $\mathbf{pc}$ + 1 |  |  |
|   |       |   | For icode 6, increase <b>pc</b> by 2 at end of instruction                  |  |  |
| - | 7     |   | Compare <b>rA</b> as 8-bit 2's-complement to <b>0</b>                       |  |  |
|   |       |   | if $rA <= 0$ set $pc = rB$                                                  |  |  |
|   |       |   | else increment <b>pc</b> as normal                                          |  |  |



| 3 | <b>rA</b> = read from memory at address <b>rB</b> |
|---|---------------------------------------------------|
| 4 | write <b>rA</b> to memory at address <b>rB</b>    |

Looks like we have a conflict. Thoughts on how we could fix this?













| icod | е | b | meaning                                                    |
|------|---|---|------------------------------------------------------------|
| 0    |   |   | rA = rB                                                    |
| 1    |   |   | rA += rB                                                   |
| 2    |   |   | rA &= rB                                                   |
| 3    |   |   | <b>rA</b> = read from memory at address <b>rB</b>          |
| 4    |   |   | write <b>rA</b> to memory at address <b>rB</b>             |
| 5    |   | 0 | $rA = \sim rA$                                             |
|      |   | 1 | rA = -rA                                                   |
|      |   | 2 | rA = !rA                                                   |
|      |   | 3 | rA = pc                                                    |
| 6    |   | 0 | rA = read from memory at pc + 1                            |
|      |   | 1 | rA += read from memory at pc + 1                           |
|      |   | 2 | rA &= read from memory at pc + 1                           |
|      |   | 3 | rA = read from memory at the address stored at $pc$ + 1    |
|      |   |   | For icode 6, increase <b>pc</b> by 2 at end of instruction |
| 7    |   |   | Compare <b>rA</b> as 8-bit 2's-complement to <b>0</b>      |
|      |   |   | if rA <= 0 set pc = rB                                     |
|      |   |   | else increment <b>pc</b> as normal                         |

UNIVERSITY VIRGINIA

ENGINEERING

| 5 0 | $rA = \sim rA$ |                        |
|-----|----------------|------------------------|
| 1   | rA = -rA       | Draw out the flow here |
| 2   | rA = !rA       |                        |
| 3   | rA = pc        |                        |





















#### **OUR SINGLE CYCLE TOY PROCESSOR**





#### **OUR SINGLE CYCLE TOY PROCESSOR**



# WHAT ABOUT DETAILS OF THE MAIN MEMORY

- 1. How is it implemented?
- 2. How does it work underhood?
- 3. Don't worry we'll answer this in CSO 2.
  - It is actually a complex hierarchy including a controller, caches, and Hardware support for virtual memory like TLBS (translation lookaside buffers)
  - It doesn't always return a value in a single cycle so the controller might have to insert nops in the pipeline etc.



# **MEMORY HIERARCHY**



Levels in Speed and Capacity of the Memory

Higher levels larger and slower memory

#### Memory Hierarchy Design

Figure from: https://www.geeksforgeeks.org/memory-hierarchy-design-and-its-characteristics/



New 8-Core Intel<sup>®</sup> Core<sup>™</sup> i7 Processor Extreme Edition



Intel® Core™ i7-5960X Processor Extreme Edition Transistor count: 2.6 Billion Die size: 17.6mm x 20.2mm



UNIVERSITY VIRGINIA

ENGINEERING

\* 20MB of cache is shared across all 8 cores



UNIVERSITY VIRGINIA

ENGINEERING





# **DEFINING A NEW INSTRUCTION**

Let's create a new instruction that will both save the location to return and jump to the beginning of the function. We'll name this our **call** instruction

```
Save pc+2 , set pc = M[pc+1]
```

Let's also create an instruction that sets the PC back to the saved. We'll name this our return instruction or ret for short

pc = Saved Value



JNIVERSITY ″VIRGINIA

**ENGINEERING** 

# WHAT ABOUT FUNCTIONS



What about recursive functions? Functions that call themselves

Now we need to keep track of both the location return to (multiple function calls and the register state of function before the call)



# THE STACK

#### 0xFF

We are going to a region of memory that will hold the stack of function states and their associated return addresses.



By convention keep adding new things to the stack by growing it to lower addresses



### THE STACK







We'll also create two instructions that will add and remove values from the stack.

The push instruction will decrement the RSP and to the top of the stack

Example push(0x04)





Example push(0x04)

UNIVERSITY ENGINEERING



Example x = pop()





While the **pop** instruction returns the value at the top of the stack and **then** increments RSP

Example x = pop() returns 0x04

# WHAT ABOUT THE FUNCTION PARAMETERS

We need to define a calling convention. The rules that we'll follow when we call a function.

- For our simple processor functions are limited to 2 parameters.
- 2. The first parameter will be stored in R2
- 3. The second parameter will be stored in R3
- 4. The return value of the function will be stored in R0
- 5. If the function uses any other registers save them before modifying them and restore them before returning.

input = 0xFF

shiftAmount = 0x02

output = left\_shift(input, shiftAmount)



R2 = 0xFF R3 = 0x02 call left\_shift R0 //Contains result

> NIVERSITY Wirginia

ENGINEERING

# THOUGHT EXPERIMENTS

Could you implement the left\_shift function using our toy ISA?

output = left\_shift(input, shiftArmount)

Hint: Left shifts by 1 is equivalent to multiplying the number by 2.



# ISA EXTENDED BY SETTING R BIT TO 1

| icode | b | operation                                                                        |
|-------|---|----------------------------------------------------------------------------------|
| 0     |   |                                                                                  |
|       | 0 | Decrement rsp and push the contents of rA to the stack                           |
|       | 1 | Pop the top value from the stack into rA and increment rsp                       |
|       | 2 | Push pc+2 onto the stack, set pc = M[pc+1]                                       |
|       | 3 | pc = pop the top value from the stack<br>If b is not 2, update the pc as normal. |



# COULD YOU ADD PUSH, POP, CALL AND RET?





https://github.com/MKrekker/SINGLE-CYCLE-RISC-V



### WHAT ABOUT FABRICATING THESE

You can express our design in a programming language called VHDL.

Simulate your processor in model sim And then send off the TSMC, UMC, or Samsung to get fabricated.

Don't worry you'll not have to write VHDL in this course. But ECE does offer courses. Maybe I will rework or simulation lab to give us a taste of this language. 1signal and\_gate : std\_logic; 2and\_gate <= input\_1 and input\_2;</pre>

1entity example\_and is
2 port (
3 input\_1 : in std\_logic;
4 input\_2 : in std\_logic;
5 and\_result : out std\_logic
6 );
7end example and;

1architecture rtl of example\_and is
2 signal and\_gate : std\_logic;
3begin
4 and\_gate <= input\_1 and input\_2;
5 and\_result <= and\_gate;
6end rtl;</pre>

# THE MAP (THE CODE)





# THE MAP (THE CODE)

#include <stdio.h>
int main() {
 printf("Hello, World!");
 return 0;

We will not cover this conversion in detail. CS 4620 - Compilers is a class dedicated to building and understanding the program designed to do this conversion.

| 0000000000001149 <main>:</main> |                                 |
|---------------------------------|---------------------------------|
| 1149: f3 0f 1e fa               | endbr64                         |
| 114d: 55                        | push %rbp                       |
| 114e: 48 89 e5                  | mov %rsp,%rbp                   |
| 1151: 48 8d 05 ac 0e 00 00      |                                 |
| lea 0xeac(%rip),%rax            | # 2004                          |
| <_I0_stdin_used+0x4>            |                                 |
| 1158: 48 89 c7                  | mov %rax,%rdi                   |
| 115b: e8 f0 fe ff ff            | call 1050 <puts@plt></puts@plt> |
| 1160: b8 00 00 00 00            | mov \$0x0,%eax                  |
| 1165: 5d                        | pop %rbp                        |
| 1166: c3                        | ret                             |

We'll focus on understanding the output of the program and how this output gets executed on a machine







The Stack is a region of memory



#### **OUR JOURNEY SO FAR**





#### **RISC-V MACHINE**





#### **RISC-V MACHINE**



# THE MAP (THE CODE)

#include <stdio.h>
int main() {
 printf("Hello, World!");
 return 0;

We will not cover this conversion in detail. CS 4620 - Compilers is a class dedicated to building and understanding the program designed to do this conversion.

| 0000000000001149 <main>:</main> |                                 |
|---------------------------------|---------------------------------|
| 1149: f3 0f 1e fa               | endbr64                         |
| 114d: 55                        | push %rbp                       |
| 114e: 48 89 e5                  | mov %rsp,%rbp                   |
| 1151: 48 8d 05 ac 0e 00 00      |                                 |
| lea 0xeac(%rip),%rax            | # 2004                          |
| <_I0_stdin_used+0x4>            |                                 |
| 1158: 48 89 c7                  | mov %rax,%rdi                   |
| 115b: e8 f0 fe ff ff            | call 1050 <puts@plt></puts@plt> |
| 1160: b8 00 00 00 00            | mov \$0x0,%eax                  |
| 1165: 5d                        | pop %rbp                        |
| 1166: c3                        | ret                             |

We'll focus on understanding the output of the program and how this output gets executed on a machine

# THE ISA ALSO INCLUDES FLOATING LAYOUT SUPPORTED AND REGISTER AND THEIR DESCRIPTION

https://www.elsevier.com/\_\_data/assets/pdf\_file/0011/ 297533/RISC-V-Reference-Data.pdf

Let's look at the section that describes floating point And instruction encodings. Focus many on the second page

