Reputation: 343
I am really into understanding programming from the bottom up. So, I have learned the internal construction of a tiny 64kb computer, because I'm super interested in understanding computers from the transistor level. I understand transistors, the creation of multiplexers, decoders, creation of ALU, etc.
I get that for LC3, which is what I learned, opcodes like 0001 011 011 100001 etc will mean that the 0001 will get decoded as an Add instruction etc. Yet, I am confused as to how we can write assembly to lead to this. I understand an assembler translates an instruction like ADD R3, R1, R2 and turns it into machine code, but what's really bugging me is how these ASCII characters get "interpreted" into the machine code.
I know at the electronic level how such an instruction is processed, like JMP to change the Program counter etc, but how at the rudimentary level, how do the assembly instructions turn into machine/binary? I do not get how it goes from assembly to machine code.
I couldn't find much online but a theory I came up with is that the typed keys actually just send an electrical signal which is actually binary, yet still don't get how the computer architecture turns this "ADD" into 0001, as it would need to understand the ADD in its entirety, not just binary for what each character is. So, what is the process of turning the assembly into binary that can then control the logic gates, decodes, sign extension etc?
EDIT: For those asking which book I use, it's Introduction to Computing Systems: From Bits and Gates to C and Beyond 2nd Edition (Patt) It goes from building logic gates from P/N transistors to assembly to C. I could not recommend it more for anyone who wants an overview of the entire process.
Upvotes: 7
Views: 2878
Reputation: 1128
Please refer to comments above first, and you will understand that assembly source file is not converted to binary at runtime. (An assembler merely replaces STRING into some special byte sequences!) Later below I will add some explanations about how our PC executes native byte code.
We press power button. The capacitor related circuit discharges/charges the CPU's reset pin.
CPU resets self. It assigns its program counter to BIOS's boot-up program.
BIOS executes
BIOS does some essential things to operate our pc.
BIOS loads the boot loader to memory and call it.
BIOS reads some bytes from Boot record and checks if its 512'th bytes are 0x55 0xAA
, which is 0x01010101 10101010b
in binary to check if the sector is a boot sector. If it is right the BIOS loads the contents to address 0xC200 and jumps to 0xC200.
Boot loader executes.
It initializes peripheral devices like PIC, Video card, etc. It setups A20 gate to tell that we want to use more than 1MB memories. Also it loads almost every kernel modules that could not be loaded because of size limit from BIOS. It also changes CPU mode to 32bit or 64bit etc.
Operating system initiates itself.
It initializes IDT, GDT, timers, data structures for it to operate, and loads/parse filesystem to memory.
Now you see "welcome" message.
Now you create a file named test.asm
.
C:
XOR EAX, EAX
NOP
JMP C
And your test.asm will look like this in binary(hex)
43 3a 0d 0a 20 20 20 20 20 20 20 20 20 20 20 20 20 20 4d 4f 56 20 30 2c 20 52 41 58 0d 0a 20 20 20 20 20 20 20 20 20 20 20 20 20 20 4e 4f 50 0d 0a 20 20 20 20 20 20 20 20 20 20 20 20 20 4a 4d 50 20 43
You assemble this with assembler.
(I assembled this manually so don't believe my byte codes...)
Assembler output can be: e.g.
31 C0 90 EB FC
The point is that your source file bytes and assembled binary files are completely different. (An assembler merely replaces STRING into some special byte sequence!)
10. And how the bytes are interpreted by CPU: (e.g. Reduced Instruction set Computer, 32 bits.. e.g. old MIPS)
In short, ALU is just a calculator, and machine language is data that tells the calculator operators and operands. The CPU divides the instruction bytes obtained by referring to the PC register into bits and interprets them. The bits 0 to 5, 0 to 6, and so on of these bits of an instruction tell the calculator what to do (eg, add ADD (eg 001001)). From the 6th or 7th bits, it can be used to specify the operands necessary for this operation. You can specify a register id, a memory address, and a constant. For a simple example, Assume that the instruction id of ADD is 01101, the register AX has an id of 00001, and the instructions of this CPU are of the following 32-bit structure:
op rs rt rd shamt funct
0-5 6-10 11-15 16-20 21-25 26-31
Op is operator id, rs and operand 1,2 respectively , and rd destination of operation. Shamt and funct is used for special purpose .
When you assemble assembly instructions for ADD AX AX AX
, the assembler uses the information obtained from this line (op = 011101, rs = 00001, rt = 00001, rd = 00001, shamt = 00000, funct = 0000000)
01110100001000010000100000000000 (74 31 08 00)
Can be created. The hex editor will show 74 31 08 00
, but the CPU reads it as 011101 00001 00001 00001 00000 000000
And selects 011101
as the operator of the ALU and rs and rt of the register 00001
in the register pile as the operand 1 and operand 2 of the ALU respectively. When the ALU completes the calculation, the register file stores the rd value 00001 Record the value. The next CPU adds 4 to the PC register and the process repeats.
So here's a pseudo assembler code. (Just for understanding purpose, it doesn't work at all!) (Intentionally omitted label , jump issues for simplicity)
for(String line: filecontent)
{
Assemble(line);
}
void Assemble(String line)
{
String[] parsed=line.split by_comma_or_space();
String operator=parsed[0] ;
String operand1=parsed[1];
String operand2=parsed[2];
String operand3=parsed[3];
unsigned int opcode=opcodemap.get(operator);
unsigned int operand1id=getOperandId(operand1);
unsigned int operand2id=getOperandId(operand2);
unsigned int operand3id=getOperandId(operand3);
unsigned int totalcode=opcode<<32;
totalcode|=operand1<<26;
totalcode|=operand2<<21;
totalcode|=operand3<<16;
WritetoFile(totalcode);
}
Supplementary readings
RISC/CISC
CPUs can be classified by byte length of instructions. CISC instructions can have variable length of byte sequences for a single instruction. For example, RET
is C3
, NOP
is 90
, and CC
for INT 3
(1 byte per instruction), but EB xx xx xx xx
for JMP xxxxxxxx
(5 bytes) and so on. As named complex, its internal structure is complex and hard to implement. Its benefit is that the CISC CPUs can use memory wisely and support numerous instructions that can be executed in one clock cycle.
RISC (Reduced instruction set computer)
Unlike CISC, it has fixed length of an instruction, like 32 bits for an instruction in 32bit computers. What I explained above was CISC. CISC instructions can be easily parsed by bits to specify operator and operands. Its benefit is that the implementation is so simpler than CISC that even I could understand by studing. Its loss is number of operators that can be supported.
More materials for CISC/RISC difference
Sorry, but I know only about 32 bit computers and a newbie in 64 assembly.
I hope my answer was helpful though those are from my short understandings I got when I was curious about whole skeme, from silicon to applications, like you.
Upvotes: 1
Reputation: 364190
An assembler is a software program that reads text and writes binary. It's not "special" in any way. It doesn't run as you type or anything.
CPUs run machine code stored in RAM or ROM chips. Assemblers are just convenient ways to generate the binary data, which you can then feed into an EEPROM or flash programming machine (for example) to make a chip with code in it. Or if running on the same computer, to assemble into RAM or into a file.
To bootstrap a new platform, you typically write an assembler for it on a different computer, and use that to generate binary files (or ROM / flash chips) containing machine code for the new system.
For a microcontroller, this is the normal workflow; develop on a desktop, build an image (assemble), flash it onto the embedded system with hardware connected to the desktop, then boot it on the microcontroller. With a "toy" computer like LC-3, the process would be the same. You typically wouldn't bother writing an assembler that can run on the LC-3. Although you certainly could; 64kiB of RAM is plently, and I think LC-3 is capable enough with bitwise ops (unlike MARIE or some other over-simplified teaching architectures) that it wouldn't take ridiculous amounts of code to do normal things like encode operands into bits.
The very first assembler to be written had to be written in machine code, maybe on punch cards, or by flipping switches on the console of a machine to create binary codes. Some hardware which humans could interact with, and which produced the desired digital logic 0s and 1s using just hardware. Lots of software was written before the first assembler existed: very early computers were so rare that you didn't use them for one-time text processing tasks; you can do that by hand!
Related: The Story of Mel is an excellent true story about a guy learning programming in assembly, working with an expert veteran who wrote his programs directly in machine-code, on a drum-memory computer in the 1960s. Definitely worth reading, there's an interesting ethical conundrum, too. Anyway, might give you a bit of an idea about programming without an assembler.
Related: How was the first assembler for a new home computer platform written? on retrocomputing.SE has some answers that might help you grok things, and specifically these comments describing the exact process of creating machine code without an assembler:
In the early days memorising binary instructions was useful because a lot of computers allowed you to alter RAM contents using physical switches on a control panel. Indeed, entering a bootloader by directly toggling RAM values was standard boot procedure on some early computers. - slebetman
@slebetman I remember doing that in one of my earlier mainframe operator jobs. We used a manually entered bootloader to load further instructions from punched card, and the punched cards contained a bootstrap that allowed us to load a full OS from a drum hard disk. Good times... – Rob Moir (later in the same thread)
Related stuff about the layers of CPU design between transistor physics and assembly language.
Upvotes: 5