Reputation: 2559
Good day everyone,
I'm new in C programming and I don't have a lot of knowledge on how to handle very huge matrix in C. e.g. Matrix size of 30.000 x 30.000.
My first approach is to store dynamically memory:
int main()
{ int **mat;
int j;
mat = (int **)malloc(R*sizeof(int*));
for(j=0;j<R;j++)
mat[j]=(int*)malloc(P*sizeof(int));
}
And it is a good idea to handle +/- matrix of 8.000 x 8.000. But, not bigger. So, I want to ask for any light to handle this kind of huge matrix, please.
As I said before: I am new to C, so please don't expect too much experience.
Thanks in advance for any suggestion,
David Alejandro.
PD: My laptop conf is linux ubuntu, 64bit, i7, and 4gb of ram.
Upvotes: 2
Views: 7324
Reputation: 91
I dont know how to add comments so dropping an answer here.
1 thing tha I can think is, you are not going to get those values in running program. Those will come from some files only. So instead taking all values, keep reading 30,000x2 one by one so that will not come into memory.
For 30k*30k matrix, if init value is 0(or same) for all elements what you can do is, instead creating the whole matrix, create a matrix of 60k*3 (3 cols will be : row no, col no and value). This is becasue you will have max 60k different location which will be affected.
I know this is going to be a little slow because you always need to see if the element is already added or not. So, if speed is not your concern, this will work.
Upvotes: 0
Reputation: 59997
For a matrix as large as that, I would try to avoid all those calls to malloc
. This will reduce the time to set up the datastructure and remove the memory overhead with dynamic memory (malloc
stores additional information as to the size of the chunk)
Just use malloc
once - i.e:
#include <stdlib.h>
int *matrix = malloc(R * P * sizeof(int));
Then to compute the index as
index = column + row * P;
Also access the memory sequentially i.e. by column first. Better performance for the cache.
Upvotes: 5
Reputation: 2742
Well, a two-dimensional array (roughly analogous C representation of a matrix) of 30000 * 30000
ints
, assuming 4 bytes per int
, would occupy 3.6 * 10^9
bytes, or ~3.35 gigabytes. No conventional system is going to allow you to allocate that much static virtual memory at compile time, and I'm not certain you could successfully allocate it dynamically with malloc()
either. If you only need to represent a small numerical range, then you could drastically (i.e., by a factor of 4) reduce your program's memory consumption by using char
. If you need to do something like, e.g., assign boolean values to specific numbers corresponding to the indices of the array, you could perhaps use bitsets and further curtail your memory consumption (by a factor of 32). Otherwise, the only viable approach would involve working with smaller subsets of the matrix, possibly saving intermediate results to disk if necessary.
If you could elaborate on how you intend to use these massive matrices, we might be able to offer some more specific advice.
Upvotes: 1
Reputation: 76695
Assuming you are declaring your values as float
rather than double
, your array will be about 3.4 GB in size. As long as you only need one, and you have virtual memory on your Ubuntu system, I think you could just code this in the obvious way.
If you need multiple matrices this large, you might want to think about:
Putting a lot more RAM into your computer.
Renting time on a computing cluster, and using cluster-based processing to compute the values you need.
Rewriting your code to work on subsets of your data, and write each subset out to disk and free the memory before reading in the next subset.
You might want to do a Google search for "processing large data sets"
Upvotes: 0