enko
enko

Reputation: 644

Why is iterating a large array on the heap faster than iterating same size array on the stack?

I am allocating 2 same size arrays, one on stack, one on heap, then iterating over them with trivial assignment.

Executable is compiled to allocate 40mb for main thread stack.

This code has only been tested to compile in vc++ with /STACK:41943040 linker tag.

#include "stdafx.h"
#include <string>
#include <iostream>
#include <malloc.h>
#include <windows.h>
#include <ctime>

using namespace std;

size_t stackavail()
{
    static unsigned StackPtr;   // top of stack ptr
    __asm mov [StackPtr],esp    // mov pointer to top of stack
    static MEMORY_BASIC_INFORMATION mbi;            // page range
    VirtualQuery((PVOID)StackPtr,&mbi,sizeof(mbi)); // get range
    return StackPtr-(unsigned)mbi.AllocationBase;   // subtract from top (stack grows downward on win)
}

int _tmain(int argc, _TCHAR* argv[])
{
    string input;

    cout << "Allocating 22mb on stack." << endl;
    unsigned int start = clock();
    char eathalfastack[23068672]; // approx 22mb
    auto length = sizeof(eathalfastack)/sizeof(char);
    cout << "Time taken in ms: " << clock()-start << endl;

    cout << "Setting through array." << endl;
    start = clock();
    for( int i = 0; i < length; i++ ){
        eathalfastack[i] = i;
    }
    cout << "Time taken in ms: " << clock()-start << endl;
    cout << "Free stack space: " << stackavail() << endl;


    cout << "Allocating 22mb on heap." << endl;
    start = clock();
    // auto* heaparr = new int[23068672]; // corrected
    auto* heaparr = new byte[23068672];
    cout << "Time taken in ms: " << clock()-start << endl;

    start = clock();
    cout << "Setting through array." << endl;
    for( int i = 0; i < length; i++ ){
        heaparr[i] = i;
    }
    cout << "Time taken in ms: " << clock()-start << endl;

    delete[] heaparr;
    getline(cin, input);
}

The output is this:

    Allocating 22mb on stack.
    Time taken in ms: 0
    Setting through array.
    Time taken in ms: 45
    Free stack space: 18872076
    Allocating 22mb on heap.
    Time taken in ms: 20
    Setting through array.
    Time taken in ms: 35

Why is iteration of stack array slower than same thing on heap?

EDIT: nneonneo cought my error

Now output is identical:

    Allocating 22mb on stack.
    Time taken in ms: 0
    Setting through array.
    Time taken in ms: 42
    Free stack space: 18871952
    Allocating 22mb on heap.
    Time taken in ms: 4
    Setting through array.
    Time taken in ms: 41

Release build per Öö Tiib's answer below:

    Allocating 22mb on stack.
    Time taken in ms: 0
    Setting through array.
    Time taken in ms: 5
    Free stack space: 18873508
    Allocating 22mb on heap.
    Time taken in ms: 0
    Setting through array.
    Time taken in ms: 10

Upvotes: 5

Views: 1760

Answers (3)

prathmesh.kallurkar
prathmesh.kallurkar

Reputation: 5686

There are two parts to your question :

  1. Allocating space on the stack vs heap
  2. Accessing a memory location on stack vs globally visible

Allocating space

First, lets look at allocating space on the stack. The stack as we know grows downwards on the x86 architecture. So, in order to allocate space on the stack, all you have to do is decrement the stack pointer. Just one assembly instruction (dec sp, #amount). This assembly instruction is always present in the prologue of a function (function set-up code). So, as far as I know, allocating space on stack must not take any time. Cost of allocating space on stack = ( decrement sp operation). On a modern super-scalar machine, this execution of this instruction will be overlapped with other instructions.

Allocating space on the heap on the other hand requires a library call to new/malloc. The library call first checks if there is some space on the heap. If yes then it will just return a pointer to the first available address. If space is not available on the stack, it will use a brk system call to request kernel to modify the page-table entries for the additional page. A system call is a costly operation. It will cause a pipeline flush, TLB pollution, etc. So, the cost of allocating space on heap = (function-call + computation for space + (brk system call)?). Definitely, allocating space on heap seems order of magnitude slower than stack.

Accessing element The addressing mode of the x86 ISA allows memory operand to be addressed using direct addressing mode (temp=mem[addr]) to access a global variable while the variables on stack are generally accessed using indexed addressing mode. (temp=mem[stack-pointer+offset-on-stack]). My assumption is that both the memory operands should take almost the same time however, the direct addressing mode seems definitely faster than the indexed addressing mode. Regarding the memory access of an array, we have two operands to access any element - base address of array and index variable. When we are accessing an array on stack, we add one more operand - the stack - pointer . The x86 addressing mode has a provision for such addresses - base+scale*index+offset . So, okay stack array element access : temp=mem[sp+base-address+iterator*element-size] and heap array access : temp=mem[base-address+iterator*element-size]. Clearly, the stack access must the costlier than the array access.

Now, coming to a generic case of iteration, if the iteration is slower for stack, it means addressing mode may(i am not completely sure) the bottle-neck and if allocating the space is bottleneck, the system call may be the bottleneck.

Upvotes: -1

&#214;&#246; Tiib
&#214;&#246; Tiib

Reputation: 10979

Something is wrong with your PC, on mine ages old Pentium 4 it takes 15 ms to assign such stack-based char array. Did you try with debug version or something?

Upvotes: 1

nneonneo
nneonneo

Reputation: 179422

Your arrays are not the same size; sizeof(char[23068672]) != sizeof(int[23068672]), and the elements are of different types.

Upvotes: 9

Related Questions