Baesm
Baesm

Reputation: 83

Stack overflow in C++ with big array

Well, I am writing a program for the university, where I have to put a data dump into HDF format. The data dump looks like this:

"1444394028","1","5339","M","873"
"1444394028","1","7045","V","0.34902"
"1444394028","1","7042","M","2"
"1444394028","1","7077","V","0.0470588"
"1444394028","1","5415","M","40"
"1444394028","1","7077","V","0.462745"
"1444394028","1","7076","B","10001101"
"1444394028","1","7074","M","19"
"1444394028","1","7142","M","16"
"1444394028","1","7141","V","0.866667"

For the HDF5 API I need an array. So my method at the moment is, to write the data dump into an array like this:

int count = 0;
std::ifstream countInput("share/ObservationDump.txt");
std::string line;

if(!countInput) cout << "Datei nicht gefunden" << endl; 

while( std::getline( countInput, line ) ) {
    count++;
}

cout << count << endl;

struct_t writedata[count];

int i = 0;

std::ifstream dataInput("share/ObservationDump.txt");
std::string line2;

char delimeter(',');

std::string timestampTemp, millisecondsSinceStartTemp, deviceTemp, typeTemp, valueTemp;

while (std::getline(dataInput, timestampTemp, delimeter) ) 
{
    std::getline(dataInput, millisecondsSinceStartTemp, delimeter);
    std::getline(dataInput, deviceTemp, delimeter);
    std::getline(dataInput, typeTemp, delimeter);
    std::getline(dataInput, valueTemp);

    writedata[i].timestamp = atoi(timestampTemp.substr(1, timestampTemp.size()-2).c_str());
    writedata[i].millisecondsSinceStart = atoi(millisecondsSinceStartTemp.substr(1, millisecondsSinceStartTemp.size()-2).c_str());
    writedata[i].device = atoi(deviceTemp.substr(1, deviceTemp.size()-2).c_str());
    writedata[i].value = atof(valueTemp.substr(1, valueTemp.size()-2).c_str());
    writedata[i].type = *(typeTemp.substr(1, typeTemp.size()-2).c_str());

    i++; 
}

with struct_t defined as

struct struct_t
{
    int timestamp;
    int millisecondsSinceStart;
    int device; 
    char type; 
    double value; 
};

As some of you might see, with big data dumps (at about 60 thousand lines) the array writedata tends to generate a stack overflow (segmentation error). I need an array to pass it to my HDF adapter. How can I prevent the overflow? I was not able to find answers by extensive googling. Thanks in advance!

Upvotes: 1

Views: 2325

Answers (2)

Daniel H
Daniel H

Reputation: 7463

The example code you are following is in C, while the code you are writing is it C++. In most cases, valid C code is valid C++ code, although not necessarily good style; this is one of the times where it is not, although since that isn’t your real problem I’ll leave the explanation of that at the end of my answer.

When you declare struct_t writedata[count];, you are creating an array on the stack. The stack is often artificially limited in size, and so creating a large array on the stack could lead to a problem where you run out of stack space. This is what you are seeing. The typical solution is to create large data structures in the heap (although the primary use of the heap is to make data that lasts past the return of the function that creates it).

The most C++-idiomatic way to access the heap is to not do it directly, but to use a helper container class. In this case what you want is an std::vector, which lets you push data onto the end and will automatically grow as you push on more data. Since it automatically grows, you don’t need to specify the size in advance; just declare it as a std::vector<struct_t> writedata; (read “std::vector of struct_t”). Again, since it doesn’t need to know the size in advance, you can also ignore the whole first loop.

The vector is initially empty; to put data into it, you usually want to use writedata.push_back() or writedata.emplace_back(). The first of these takes an existing struct_t; the second takes the parameters you’d use to create one. All of the elements are stored contiguously in memory, like in a C array, which you can access directly with writedata.data().

At the end of the function, when the vector goes out of scope and is no longer accessible, its destructor will be called and automatically clean up the memory it used.


Another option, instead of using std::vector, is to manage the memory yourself. The C++ way of doing that is with new and delete. The easiest way to handle that is to still calculate count, as you do, but instead of creating the array on the stack by just declaring it as a count-sized array, you do struct_t* writedata = new struct_t[count];. This will create an array of count struct_ts in the heap, and set writedata as a pointer to the first element of this array. Then you can use it as you use the array in your program, but since it’s on the heap you won’t run out of stack space.

The downsides to this are that you need to know the size in advance, and you need to clean up the memory you used yourself. To do this, when you no longer need the data, you should run delete[] writedata. After that, writedata will still point to the same place in memory, but your program no longer owns that data, so you need to make sure to never use that value again; the standard way is to, immediately after deletion, set writedata to nullptr.

You can also use the C equivalents to new and delete, which are malloc and free. They are mostly equivalent in your case, but for more complicated examples you should keep in mind that these leave the memory uninitialized, while new and delete will run the constructors/destructors of what you create to make sure the objects are in a sane state at the beginning and don’t leave resources lying around at the end.


Now for why your original code isn’t actually valid C++ for any size of file: Your line struct_t writedata[count]; tries to create an array of count struct_ts. Since count is a variable, this is called a variable-length array (VLA). Such things are legal in newer versions of C, but not in C++. This alone is just worth a warning as long as you only want to compile the code on the same system you’re currently using, since your compiler seems to support VLAs as an extension. However, if you want to compile your code on any other system (make it more portable), you shouldn’t use compiler extensions like this.

Upvotes: 3

Iain Brown
Iain Brown

Reputation: 1171

struct_t writedata[count];

This array is allocated on the stack which is normally quite small, and when it gets to a value that's too big (which is semi-arbitrary) this will overflow the stack. You'd be better off allocating on the heap by doing something like:

struct_t* writedata = (struct_t*)malloc(sizeof(struct_t) * count);

And then add a corresponding call to free once you're finished with the memory, e.g.

free(writedata);
writedata = nullptr;

It's best practice to check that i < count in your while loop, as if you write off the end of your array Bad Things may happen.

Upvotes: -1

Related Questions