Koray Tugay
Koray Tugay

Reputation: 23800

I cannot understand the abstraction between characters we see and how computers treats them

This is pretty low level and English is not my mother tongue so please be easy on me.

So imagine you are in bash and command prompt is in front of your screen.

When you type ls and hit enter, you are actually sending some bytes to the cpu, 01101100 01110011 00001010 (that is: l s linefeed) from your computer right? The keyboard controller sends the bytes to the cpu, and the cpu tells the operating system what bytes have been received.

So we have an application that is called 01101100 01110011 in our hard drive (or memory..) if I understand correctly? That is a file, and it is an executable file. But how does the operating system find 01101100 01110011 in a drive or in memory?

Also I want to expand this question to functions. We say C Standard library has a function called printf for example. How can a function have a name that is in a file? Ok, I understand that the implementation of printf function is cpu and operating system specific and is a number of machine instructions lying somewhere in the memory or hard drive. But I do not understand how we get to it?

When I link a code that requires the implementation of printf, how is it found? I am assuming the operating system knows nothing about the name of the function, or does it?

Upvotes: 0

Views: 81

Answers (2)

Paul Ogilvie
Paul Ogilvie

Reputation: 25286

Koray, user @DrKoch gave a good answer, but I'd like to add some abstractions.

First, ASCII is a code. It is a table with bit patterns in one column and a letter in the next column. The bit patterns are exactly one byte long (excluding 'wide chars' and so). If we know a byte is supposed to represent a character, then we can look up the bit pattern of the byte in the table. A print function (remember the matrix printers?) receives a character (a byte) and instructs the needles of the matrix printer to hammer in some orderly way onto the paper and see, a letter is formed that humans can read. The ASCII code was devised because computers don't think in letters. There are also other codes, such as EBCDIC, which only means the table is diferent.

Now, if we don't know the byte is a representation of a letter in a certain code, then we are lost and the byte could just mean a number. We can multiply the byte with another byte. So you can multiply a' with 'p', which gives 97 * 112= 10864. Does that male sense? Only if we know the bytes represent numbers and is nonsense if the bytes represent characters.

The next level is that we call a sequence of bytes that are all supposed to represent letters (characters) a 'string' and we developed functions that can search, get and append from/to strings. How long is a string? In C we agreed that the end of the string is reached when we see a byte whose bit pattern is all zeroes, the null character. In other languages, a string representation can have a length member and so won't need a terminating null character.

This is an example of a "stacking of agreements". Another example (referring to a question you asked before) is interrupts: the hardware defines a physical line on the circiut board as an interrupt line (agreement). It gets connected to the interrupt pin (agreement) of the processor. A signal on the line (e.g. from an external device) causes the processor to save the current state of registers (agreement) and transfer control to a pre-defined memory location (agreement) where an interrupt handler is placed (agreement) which handles the request from the external device. In this example of stacking we can go many levels up to the functional application, and many levels down to the individual gates and transistors (and the basal definition of how many volts is a '1' and how many volts is a '0', and of how long that voltage must be observed before a one or zero has definitiely been seen).

Only when understanding that all these levels are only agreements, can one understand a computer. And only when understanding that all these levels are only agreements made between humans, can one abstract from it and not be bothered with these basics (the engineers take care of them).

Upvotes: 3

DrKoch
DrKoch

Reputation: 9772

You'll hardly find an answer if you look at the individual bits or bytes and the CPU.

In fact, if you type l and s the ASCII codes of these character are read by the shell and combined to the string "ls". At that time the shell has build a dictionary with string keys where it finds the key "ls" and it finds that this points to a specific executable "ls" in a path like "/usr/bin".

You see, even the shell thinks in strings not in characters, bytes or even bits.

Something very similar happens inside the linker when it tries to build an executable from your code and a collection of library files (*.lib, *.dll). It has build a dictionary with "printf" as one of the keys, which points to the correct library file and an byte-offset into this file. (This is rather simplified, to demonstrate the principle)

There are several layers of libraries (and BIOS code) before all this gets to the CPU. Don't make your life too hard, don't think too much about these layers in detail.

Upvotes: 3

Related Questions