Reputation: 15091
If a variable is a boolean
or a char
(in C/C++) or a byte
(in some language like Java - no wait Java's on a VM) and the CPU uses words larger than 1 byte, isn't space wasted? I heard when the variable is stored in memory it can be stored more compact for example 1 byte even though a word is 4 bytes. Can someone explain how this happens? It seems backwards because registers are more at a premium than RAM.
Upvotes: 3
Views: 1035
Reputation: 71586
For each instruction set there are these kinds of definitions. A byte for example is 8 bits or 9 bits or some other size. A word is defined by that architecture to mean something.
8086/8088 defined a byte as 8 bits and a word as 16 bits. You had 16 bit registers, but you could also use them as 8 bit half registers ax is a 16 bit register ah is the upper half of ax and al is the lower half. Not typical but how they did it. Later 80x86 EAX becomes a 32 bit register of which ax is the lower half and so on. Since a word was defined as 16 bits EAX is defined as a double word or dword. Later 64 bit registers came along...64 bits in this architecture being called a quad word.
ARM defines a byte as 8 bits, a halfword as 16 bits, and a word as 32 bits. Others do this as well.
That is at the hardware level. Then you get into programming languages and they are free to change the definitions as well as programmers who might want to take advantage of typedefs or whatever in the language to create or change other definitions and are free to make those any size they want. The language sizes dont have to match the hardware, sometimes it is a good idea to do that but they dont have to, it is up to whomever implements that compiler and/or the backend for a particular target.
About waste...
The x86 architecture uses variable word length instructions. Which means there are some 8 bit instructions, some 16 bit instructions, 24 bit, 32, bit and so on. You might want to move from one register to another, an that instruction might only take one byte, but if you wanted to move a 16 bit value into a register, that might take one byte to say I want to move an immediate into a register, then two more bytes to define the immediate, three bytes total for that instruction. The instruction set was invented when memory was 8 bits wide and for an 8 bit wide memory system this makes sense. Now that we use 32 and 64 bit wide memory systems, this is extremely painful. Nevertheless the instruction set has 8 bit registers, ah,al,bh,bl, etc and can do 8 bit operations on 8 bit registers. So it might make sense to make a bool or other such things a byte in size to save on some space. You are already chopping up your memory in to variable sizes anyway with no alignment, might as well.
ARM, thinking traditional ARM (not thumb/thumb2), instructions are always 32 bits wide, no more no less. there are more registers than x86 bit they are not divided into half registers or byte sized registers. you dont have 8 bit operations like compare, etc. Everything is 32 bit because it is done register to register (there are some small immediates, yes). If you had a variable in your high level language that was never going to be less than -5 or greater than say +20, you might want to use a signed byte in your high level language to save on some space. You will find that at times you have to use extra instructions to sign extend or mask the data to simulate an 8 bit operation using 32 bit registers. saving 3 bytes cost you 4 or 8 or more. an 32 bit int would have been cheaper than a signed byte.
There are alignment problems, just because x86 allows for unaligned accesses (say a 32 bit read/write from/to address 0x4 with a 32 bit data bus) it costs you extra cycles. ARM and others dont allow this. but by the same token a byte write, even through the cache, costs you a read-modify-write as the cache is likely a 32 bit wide ram. to change one byte you have to read 32 bits, modify 8 , and write 32 back. costing clock cycles. Had you used a 32 bit variable instead of an 8 bit variable, even though that variable might never be outside the -5 to +20 range. you cost yourself more clock cycles. more waste.
Now as to your question about why would a system with 8 bit registers take more cycles to add 32 bit numbers than a 32 bit system. You know the answer to this already as you most likely went to grade school and learned to add with pencil and paper.
If I want to add the decimal numbers 123+789 in a world where I am allowed 3 digit registers I can perform that addition in a single cycle:
110 <--- carry bits/numbers
123
+789
====
912
Think of that as your 32 bit register based system. Now for the 8 bit register based system, the world only allows one digit at a time:
one cycle 3 + 9 with a carry in of 0
10
3
+ 9
====
2
carry out is a 1, we have to perform that operation to get the carry out so we can use it as the carry in on the next operation with the next set of registers, next cycle 2 + 8 with our carry in of 1
11
2
+ 8
====
1
carry out is also a one, third cycle 1 + 7 with a carry in of 1
01
1
+ 7
====
9
and the carry out is 0 if we needed it...
The number system doesnt matter (base 10, base 2, etc) it all works the same.
Now if the question you were trying to ask is for example, I have 2 and 3 and want to add them together, it really does seem to be a huge waste to use two 64 bit registers in a 64 bit processor to perform an operation that only takes a few bit columns. Likewise to save those to/from memory, etc. For booleans or alu operations (add, or, etc) the width of the register doesnt matter it is designed for that width, with the pipeline average one clock. It performs a 64 bit add no matter what. Yes that is a lot of wasted logic real estate. See above, you can choose to use less logic but more clocks or more logic and less clocks. The more clocks solution may involve more memory cycles as well, costing even more clocks. the wider variables may waste more ram but use fewer clocks with a wider memory system. It is all a trade off. Right now logic maxes out in the few gigahertz, memory is painfully slow but by making the bus wider and other parallel tricks you can make it appear faster. If you were to save that money on the logic and ram at the expense of clock cycles you might not be able to watch a youtube video in real time or with enough pixels to even see what is goingon. or surf the web because drawing the fonts and images which are often compressed using math functions takes so long the user cant stand it.
I recommend you look at the microchip pic instruction set
http://en.wikipedia.org/wiki/Microchip_PIC
the 12-bit instruction set table listed in the above page. Think about the last program you wrote and implementing that program with that instruction set. Even better, add three simple numbers with the pic instruction set. It has only one register, to do math you have to go get one operand put it in the w register, perform math with an f register, if you dont want to mess up the contents of the f register lets say, then you leave the result in the w register. Now, add another f register, result to w, then add your third register, result to w, then save w in a fourth f register. d = a+b+c. Sure on an arm or mips or whatever if you only need a, b, c for that one operation you have to do three reads then the add, but if one or more of those operands were the results of other operations, didnt need to be stored out to ram because we have more registers, etc, you start to see the economy of scale. the 6502 (on the surface unless you understand the zero page) and other instruction sets were geared toward less logic, less ram at the expense of clock cycles. for the older designs (x86 included) this was because logic was expensive to make and build (relatively) and memory was equally expensive.
You can take this simplification of the processor at the expense of clock cycles to the extreme, one instruction.
http://en.wikipedia.org/wiki/Single_instruction_set_computer
opencores.org has a few of these one or few instruction processors.
You mentioned java, its virtual machine operates at the expense of clock cyles (with a reward of portability) if you were to build it in hardware. See the zpu at opencores for an example of a stack based processor (not java, just another example of a stack based solution). Certainly not invented with java, pascal was originally stack based pseudo code then you implemented that pseudo code on the target. small-c generated stack based programs which you implemented for each target, etc. stack based solutions are very portable at the expense of clock cycles.
the short answer to your question, is dont focus so much on the micro, stand back a little and see the whole picture. The reason we use bigger and bigger registers in processors is because of speed, most of the time you are not using all of those address bits or data bits in an alu operation on your 64 bit processor, very true, but for the times that you do, those wider registers, those wider busses, and memories make a huge performance difference, enough to make up for the waste elsewhere. The waste is fairly cheap anyway, the cost of the processor, memory, disk space, an 8 bit multi gigahertz system would not be 8 times cheaper than a 64 bit multi gigahertz system if they need to give the same user experience or close enough to the same.
Upvotes: 2
Reputation: 31206
Generally speaking a word is the amount of data that fits in a single register.
If you made register's narrower, like say, 8 bits, than it would take multiple cycles to add 2 32-bit ints.
I also think that you want your RAM address space to fit inside a word too.
Upvotes: 6
Reputation: 47759
One thing to think about:
The "modern" (ca 1985) RISC scheme for processor design takes into account that it's relatively cheap to put a fairly large number of fairly wide registers on a processor chip, compared to the "cost" (in chip "real estate") of the "control logic".
While there are reasons to question the full truth of this argument, it does have some elements of truth, and, while a register is many times (probably 1000 or more) more expensive than the same amount of RAM, it's still relatively cheap. It would be false economy to make registers narrower and require more cycles to accomplish the same operations. Instead designers try to get as much accomplished in one cycle as possible.
Upvotes: 0
Reputation: 47759
Many early computers stored data in smaller increments, generally 4 or 6-bit digits. This worked fairly well when data paths were narrow and arithmetic was done serially, one digit at a time. And it was efficient in storage (which was a precious commodity).
Storage isn't that precious anymore -- throughput is -- and a computer can achieve better throughput with a wider data path.
Upvotes: 0
Reputation: 727047
In spite of their ability to store numbers, registers are not exactly "space".
Consider this example: you have a bunch of business cards in a big box, and you would like to arrange them alphabetically. In the process of doing so, you move cards in the box from one place to the other. Although you hold a card in your hands while you move it to its new place in the box, your hands do not provide storage for the cards in the same sense that the box does. The place in your hands is too valuable to be called "space".
Continuing with the cards analogy, imagine that cards have different sizes. The size of your hand lets you hold one large card, two medium cards, or four tiny cards at a time. However, when you sort the cards, the ability to take multiple cards is rarely of advantage to you, especially when the cards are uniformly shuffled: multiple cards have multiple destination spots, so you would need to do more complex or unnecessary operations if you grab multiple cards at the same time.
Similarly, a register of a CPU may contain multiple bytes, but if you need to perform a computation, there is often no way to tell CPU which bytes to use: a register participates in an operation as an indivisible unit. That is why an entire word is used for the data in the register, even though only a single byte would be sufficient.
Upvotes: 8
Reputation: 27351
You can store a one byte value in memory, from a word-sized register, by using the architecture's store-byte instruction (which will normally store the low-order bits of the register at the memory location).
Upvotes: 0
Reputation: 5145
It is related to the bus width. each clock cycle of the processor moves the entire processor forward in terms of bus widths, so perhaps it is not possible to optimize the register width per operand type.
Upvotes: 1