Reputation: 644
I am allocating 2 same size arrays, one on stack, one on heap, then iterating over them with trivial assignment.
Executable is compiled to allocate 40mb for main thread stack.
This code has only been tested to compile in vc++ with /STACK:41943040 linker tag.
#include "stdafx.h"
#include <string>
#include <iostream>
#include <malloc.h>
#include <windows.h>
#include <ctime>
using namespace std;
size_t stackavail()
{
static unsigned StackPtr; // top of stack ptr
__asm mov [StackPtr],esp // mov pointer to top of stack
static MEMORY_BASIC_INFORMATION mbi; // page range
VirtualQuery((PVOID)StackPtr,&mbi,sizeof(mbi)); // get range
return StackPtr-(unsigned)mbi.AllocationBase; // subtract from top (stack grows downward on win)
}
int _tmain(int argc, _TCHAR* argv[])
{
string input;
cout << "Allocating 22mb on stack." << endl;
unsigned int start = clock();
char eathalfastack[23068672]; // approx 22mb
auto length = sizeof(eathalfastack)/sizeof(char);
cout << "Time taken in ms: " << clock()-start << endl;
cout << "Setting through array." << endl;
start = clock();
for( int i = 0; i < length; i++ ){
eathalfastack[i] = i;
}
cout << "Time taken in ms: " << clock()-start << endl;
cout << "Free stack space: " << stackavail() << endl;
cout << "Allocating 22mb on heap." << endl;
start = clock();
// auto* heaparr = new int[23068672]; // corrected
auto* heaparr = new byte[23068672];
cout << "Time taken in ms: " << clock()-start << endl;
start = clock();
cout << "Setting through array." << endl;
for( int i = 0; i < length; i++ ){
heaparr[i] = i;
}
cout << "Time taken in ms: " << clock()-start << endl;
delete[] heaparr;
getline(cin, input);
}
The output is this:
Allocating 22mb on stack.
Time taken in ms: 0
Setting through array.
Time taken in ms: 45
Free stack space: 18872076
Allocating 22mb on heap.
Time taken in ms: 20
Setting through array.
Time taken in ms: 35
Why is iteration of stack array slower than same thing on heap?
EDIT: nneonneo cought my error
Now output is identical:
Allocating 22mb on stack.
Time taken in ms: 0
Setting through array.
Time taken in ms: 42
Free stack space: 18871952
Allocating 22mb on heap.
Time taken in ms: 4
Setting through array.
Time taken in ms: 41
Release build per Öö Tiib's answer below:
Allocating 22mb on stack.
Time taken in ms: 0
Setting through array.
Time taken in ms: 5
Free stack space: 18873508
Allocating 22mb on heap.
Time taken in ms: 0
Setting through array.
Time taken in ms: 10
Upvotes: 5
Views: 1760
Reputation: 5686
There are two parts to your question :
Allocating space
First, lets look at allocating space on the stack. The stack as we know grows downwards on the x86 architecture. So, in order to allocate space on the stack, all you have to do is decrement the stack pointer. Just one assembly instruction (dec sp, #amount). This assembly instruction is always present in the prologue of a function (function set-up code). So, as far as I know, allocating space on stack must not take any time. Cost of allocating space on stack = ( decrement sp operation). On a modern super-scalar machine, this execution of this instruction will be overlapped with other instructions.
Allocating space on the heap on the other hand requires a library call to new/malloc. The library call first checks if there is some space on the heap. If yes then it will just return a pointer to the first available address. If space is not available on the stack, it will use a brk system call to request kernel to modify the page-table entries for the additional page. A system call is a costly operation. It will cause a pipeline flush, TLB pollution, etc. So, the cost of allocating space on heap = (function-call + computation for space + (brk system call)?). Definitely, allocating space on heap seems order of magnitude slower than stack.
Accessing element The addressing mode of the x86 ISA allows memory operand to be addressed using direct addressing mode (temp=mem[addr]) to access a global variable while the variables on stack are generally accessed using indexed addressing mode. (temp=mem[stack-pointer+offset-on-stack]). My assumption is that both the memory operands should take almost the same time however, the direct addressing mode seems definitely faster than the indexed addressing mode. Regarding the memory access of an array, we have two operands to access any element - base address of array and index variable. When we are accessing an array on stack, we add one more operand - the stack - pointer . The x86 addressing mode has a provision for such addresses - base+scale*index+offset . So, okay stack array element access : temp=mem[sp+base-address+iterator*element-size] and heap array access : temp=mem[base-address+iterator*element-size]. Clearly, the stack access must the costlier than the array access.
Now, coming to a generic case of iteration, if the iteration is slower for stack, it means addressing mode may(i am not completely sure) the bottle-neck and if allocating the space is bottleneck, the system call may be the bottleneck.
Upvotes: -1
Reputation: 10979
Something is wrong with your PC, on mine ages old Pentium 4 it takes 15 ms to assign such stack-based char array. Did you try with debug version or something?
Upvotes: 1
Reputation: 179422
Your arrays are not the same size; sizeof(char[23068672]) != sizeof(int[23068672])
, and the elements are of different types.
Upvotes: 9