Another.Chemist
Another.Chemist

Reputation: 2559

very huge matrix in C programming

Good day everyone,

I'm new in C programming and I don't have a lot of knowledge on how to handle very huge matrix in C. e.g. Matrix size of 30.000 x 30.000.

My first approach is to store dynamically memory:

int main()
{      int **mat;
    int j;
    mat = (int **)malloc(R*sizeof(int*));
    for(j=0;j<R;j++)
        mat[j]=(int*)malloc(P*sizeof(int));
}

And it is a good idea to handle +/- matrix of 8.000 x 8.000. But, not bigger. So, I want to ask for any light to handle this kind of huge matrix, please.

As I said before: I am new to C, so please don't expect too much experience.

Thanks in advance for any suggestion,

David Alejandro.

PD: My laptop conf is linux ubuntu, 64bit, i7, and 4gb of ram.

Upvotes: 2

Views: 7324

Answers (4)

Urvish
Urvish

Reputation: 91

I dont know how to add comments so dropping an answer here.

1 thing tha I can think is, you are not going to get those values in running program. Those will come from some files only. So instead taking all values, keep reading 30,000x2 one by one so that will not come into memory.

For 30k*30k matrix, if init value is 0(or same) for all elements what you can do is, instead creating the whole matrix, create a matrix of 60k*3 (3 cols will be : row no, col no and value). This is becasue you will have max 60k different location which will be affected.

I know this is going to be a little slow because you always need to see if the element is already added or not. So, if speed is not your concern, this will work.

Upvotes: 0

Ed Heal
Ed Heal

Reputation: 59997

For a matrix as large as that, I would try to avoid all those calls to malloc. This will reduce the time to set up the datastructure and remove the memory overhead with dynamic memory (malloc stores additional information as to the size of the chunk)

Just use malloc once - i.e:

#include <stdlib.h>
int *matrix = malloc(R * P * sizeof(int));

Then to compute the index as

index = column + row * P;

Also access the memory sequentially i.e. by column first. Better performance for the cache.

Upvotes: 5

Greg E.
Greg E.

Reputation: 2742

Well, a two-dimensional array (roughly analogous C representation of a matrix) of 30000 * 30000 ints, assuming 4 bytes per int, would occupy 3.6 * 10^9 bytes, or ~3.35 gigabytes. No conventional system is going to allow you to allocate that much static virtual memory at compile time, and I'm not certain you could successfully allocate it dynamically with malloc() either. If you only need to represent a small numerical range, then you could drastically (i.e., by a factor of 4) reduce your program's memory consumption by using char. If you need to do something like, e.g., assign boolean values to specific numbers corresponding to the indices of the array, you could perhaps use bitsets and further curtail your memory consumption (by a factor of 32). Otherwise, the only viable approach would involve working with smaller subsets of the matrix, possibly saving intermediate results to disk if necessary.

If you could elaborate on how you intend to use these massive matrices, we might be able to offer some more specific advice.

Upvotes: 1

steveha
steveha

Reputation: 76695

Assuming you are declaring your values as float rather than double, your array will be about 3.4 GB in size. As long as you only need one, and you have virtual memory on your Ubuntu system, I think you could just code this in the obvious way.

If you need multiple matrices this large, you might want to think about:

  • Putting a lot more RAM into your computer.

  • Renting time on a computing cluster, and using cluster-based processing to compute the values you need.

  • Rewriting your code to work on subsets of your data, and write each subset out to disk and free the memory before reading in the next subset.

You might want to do a Google search for "processing large data sets"

Upvotes: 0

Related Questions