wastepaper
wastepaper

Reputation: 53

Segmentation fault while reading a large array from a file. C++/gcc

In the following code I'm trying to find the frequencies of the rows in fileA which have the same value on the second column. (each row has two column and both are integers.) Sample of fileA:

1   22
8   3
9   3

I have to write the output in fileB like this:

22   1
3    2

Because element 22 has been repeated once in second column(and 3 repeated 2 times.)

fileA is very large(30G). And there are 41,000,000 elements in it(in other words, fileB has 41,000,000) rows. This is the code that I wrote:

void function(){

unsigned long int size = 41000000;
int* inDeg = new int[size];

for(int i=0 ; i<size; i++)
{
    inDeg[i] = 0;
}


ifstream input;
input.open("/home/fileA");

ofstream output;
output.open("/home/fileB");

int a,b;
    
while(!input.eof())
{
   input>>a>>b; 
   inDeg[b]++; //<------getting error here.
}
input.close();


for(int i=0 ; i<size; i++)
{
    output<<i<<"\t"<<inDeg[i]<<endl;
}

output.close();
delete[] inDeg;

}

I'm facing segmentation fault error on the second line of the while loop. On the 547387th iteration. I have already assigned 600M to the stack memory based on this. I'm using gcc 4.8.2 (on Mint17 x86_64).


Solved

I analysed fileA thoroughly. The reason of the problem as hyde mentioned wasn't with hardware. Segfault reason was wrong indexing. Changing the size to 61,500,000 solved my problem.

Upvotes: 0

Views: 821

Answers (2)

Thomas Matthews
Thomas Matthews

Reputation: 57698

In the statement:

while(!input.eof())
{
   input>>a>>b; 
   inDeg[b]++;
}

Is b the index of your array?

When you read in the values:
1 22 You are discarding the 1 and incrementing the value at slot 22 in your array.

You should check the range of b before incrementing the value at inDeg[b]:

  while (input >> a >> b)
  {
    if ((b >= 0) && (b < size))
    {
      int c = inDeg[b];
      ++c;
      inDeg[b] = c;
    }
    else
    {
      std::cerr << "Index out of range: " << b << "\n";
    }
  }

Upvotes: 2

Etixpp
Etixpp

Reputation: 328

You are allocating a too huge array in to the heap. It´s a memory thing, your heap cant take that much space.

You should split your in and output in smaller parts, so at example create a for loop which goes every time 100k , deletes them and then does the next 100k.

in such cases try a exception handling, this is a example snippet how to manage exception checking for too huge arrays:

  int ii;

   double *ptr[5000000];



   try

   {

      for( ii=0; ii < 5000000; ii++)

      {

         ptr[ii] = new double[5000000];

      }

   }

   catch ( bad_alloc &memmoryAllocationException )

   {

      cout << "Error on loop number: " << ii << endl;

      cout << "Memory allocation exception occurred: "

           << memmoryAllocationException.what()

           << endl;

   }

   catch(...)

   }

      cout << "Unrecognized exception" << endl;

   {

Upvotes: 0

Related Questions