Reputation: 53
I am very new into assembly and this is a basic question.
I have just heard about the concept of using zero bytes of RAM.
I have compiled a C++ code via
g++ -O3 main.cpp -S -o main3.s
main.cpp (source)
#include <iostream>
using namespace std;
int main()
{
int low=10, high=100, i, flag;
cout << "Prime numbers between " << low << " and " << high << " are: ";
while (low < high)
{
flag = 0;
for(i = 2; i <= low/2; ++i)
{
if(low % i == 0)
{
flag = 1;
break;
}
}
if (flag == 0)
cout << low << " ";
++low;
}
return 0;
}
And here is the result:
main3.s
.file "main.cpp"
.section .rodata.str1.1,"aMS",@progbits,1
.LC0:
.string "Prime numbers between "
.LC1:
.string " and "
.LC2:
.string " are: "
.LC3:
.string " "
.section .text.startup,"ax",@progbits
.p2align 4,,15
.globl main
.type main, @function
main:
.LFB1561:
.cfi_startproc
pushq %rbx
.cfi_def_cfa_offset 16
.cfi_offset 3, -16
movl $22, %edx
movl $.LC0, %esi
movl $_ZSt4cout, %edi
call _ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_l
movl $10, %esi
movl $_ZSt4cout, %edi
call _ZNSolsEi
movl $5, %edx
movq %rax, %rbx
movl $.LC1, %esi
movq %rax, %rdi
call _ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_l
movq %rbx, %rdi
movl $100, %esi
movl $10, %ebx
call _ZNSolsEi
movl $.LC2, %esi
movq %rax, %rdi
call _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
.p2align 4,,10
.p2align 3
.L6:
movl %ebx, %esi
sarl %esi
testb $1, %bl
je .L2
movl $2, %ecx
jmp .L3
.p2align 4,,10
.p2align 3
.L14:
movl %ebx, %eax
cltd
idivl %ecx
testl %edx, %edx
je .L2
.L3:
addl $1, %ecx
cmpl %esi, %ecx
jle .L14
movl %ebx, %esi
movl $_ZSt4cout, %edi
call _ZNSolsEi
movl $1, %edx
movl $.LC3, %esi
movq %rax, %rdi
call _ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_l
.L2:
addl $1, %ebx
cmpl $100, %ebx
jne .L6
xorl %eax, %eax
popq %rbx
.cfi_def_cfa_offset 8
ret
.cfi_endproc
.LFE1561:
.size main, .-main
.p2align 4,,15
.type _GLOBAL__sub_I_main, @function
_GLOBAL__sub_I_main:
.LFB2045:
.cfi_startproc
subq $8, %rsp
.cfi_def_cfa_offset 16
movl $_ZStL8__ioinit, %edi
call _ZNSt8ios_base4InitC1Ev
movl $__dso_handle, %edx
movl $_ZStL8__ioinit, %esi
movl $_ZNSt8ios_base4InitD1Ev, %edi
addq $8, %rsp
.cfi_def_cfa_offset 8
jmp __cxa_atexit
.cfi_endproc
.LFE2045:
.size _GLOBAL__sub_I_main, .-_GLOBAL__sub_I_main
.section .init_array,"aw"
.align 8
.quad _GLOBAL__sub_I_main
.local _ZStL8__ioinit
.comm _ZStL8__ioinit,1,1
.hidden __dso_handle
.ident "GCC: (Ubuntu 7.2.0-1ubuntu1~16.04) 7.2.0"
.section .note.GNU-stack,"",@progbits
This is a basic program which can store all variables into CPU registers. Therefore, I guess it does not use RAM. I would like to know what is the criteria to check if an assembly code is using RAM?
Upvotes: 5
Views: 2165
Reputation: 365237
In the clip you linked, Jason Turner just said that the C local variables all fit in registers, so the compiler doesn't ever have to spend extra instructions spilling/reloading them.
It's using RAM to store code and data, it's just not using any stack memory to store local variables. i.e. zero bytes of RAM for local variables, of course not zero bytes total. He even says the game compiles to 1005 bytes (of code + data).
You detect this when reading asm by noting a lack of loads/stores to the stack, e.g. with addressing modes using RSP (or RBP if used as a frame pointer), on x86-64.
This is totally normal for functions that aren't huge. Inlining function calls is key to making it happen otherwise, because compilers usually have to have memory "in sync" (reflecting the correct values of the C abstract machine) when calling a non-inline function.
int foo(int num) {
int tmp = num * num;
return tmp;
}
gets num
in a register, and keeps tmp
there. Jason's talk was using Godbolt, so here's a link to the same function on Godbolt, compiled by gcc7.3 with and without optimization:
foo: # with optimization: all operands are registers
imul edi, edi
mov eax, edi
ret
foo: # without optimization:
push rbp
mov rbp, rsp # make a stack frame with RBP
mov DWORD PTR [rbp-20], edi # spill num to the stack
# start of code for first C statement
mov eax, DWORD PTR [rbp-20] # reload it
imul eax, DWORD PTR [rbp-20] # and use it from memory again
mov DWORD PTR [rbp-4], eax # spill tmp to the stack
# end of first C statement
mov eax, DWORD PTR [rbp-4] # load tmp into the return value register, eax)
pop rbp
ret
This didn't have to reserve any stack space with sub rsp, 24
, because it's using the red-zone below RSP for the locals it's spilling / reloading.
Obviously with optimization enabled, you won't get code this bad even when a compiler does run out of registers in a large complex function and has to spill something. -O0
is kind of an anti-optimization mode where each C statement gets a separate block of asm, so you can set breakpoints and modify variables and have the code still work. Or even jump to a different source line in gdb
!
Re: How many registers does x86 have, as mentioned in the talk:
i386 has 8 architectural integer registers. It has some segment registers you could abuse to keep extra values, and if it has an FPU there are 8 x87 80-bit FP stack registers. Jason's guess of 16 sounds bogus, but he may be counting AL/AH, BL/BH as separate 8-bit registers, because you can use them independently. But not at the same time as EAX, because the narrow registers are subsets of full registers.
(And beware of partial-register penalties on various modern microarchitectures. On AMD, AL and AH aren't independent at all; using one has a false dependency on the other, i.e. on the whole EAX/RAX. On CPUs up to and including Pentium P5MMX, there were no partial-register penalties at all, because no out-of-order execution or register renaming.)
His claim that modern x86-64 has hundreds of registers is also definitely bogus, unless you count all the control registers and model-specific registers. But stack memory is much faster than those registers, and you can't put arbitrary values in them anyway. With only 16 architectural integer registers (one of them being the stack pointer, so really 15 regs you can use in a big function), you still need extra instructions to spill or at least reload stuff when you need more variables "live" at once than that.
Register renaming onto a large pool of physical registers is great, and essential along with a large ReOrder Buffer for a large out-of-order execution window to find instruction-level parallelism. But you can only take advantage of these registers by reusing the same integer registers for different values. (i.e. register renaming avoids write-after-read and write-after-write hazards, making two uses of the same register actually independent.)
Haswell has a 168-entry physical register file for integer/GP registers, and also a 168-entry vector/FP register file for renaming FP / vector registers. https://www.realworldtech.com/haswell-cpu/3/. But architecturally it only has 16 GP / 16 YMM when running in x86-64 mode, or 8 / 8 in ia-32 mode.
Upvotes: 6
Reputation: 2202
Variables are not the only thing that main memory stores. In fact, when you run a program, your operating system reserves some space (called address space) for the process in charge of running your executable file.
The assembly code generated by the compilation is stored in a section (the .text
section), the data in (you don't say) the .data
section, static variables initialized to 0
in the .bss
section and so on. Strings, for example, are usually stored in read-only sections (.rodata
).
So the answer is no, every program, when running, has to use memory.
Upvotes: 2