10GeV
10GeV

Reputation: 475

Efficiently storing a matrix with many zeros, dynamically

Background:

I'm working in .

I recall there being a method to efficiently (memory-wise) store "arrays" (where an array might be made of std::vector's, std::set's, etc... I don't care how, so long as it is memory efficient and I'm able to check the value of each element) of 0's and 1's (or, equivalently, truth/false, etc), wherein there is a disproportionate number of one or the other (e.g. mostly zeroes).


I've written an algorithm, which populates an "array" (currently, a vector<vector<size_t>>) with 0's and 1's according to some function. For these purposes, we can more-or-less consider it as being done randomly. The array is to be quite large (of variable size... on the order of 1000 columns, and 1E+8 or more rows), and always rectangular.

There need be this many data points. In the best of times, my machine becomes quickly resource constrained and slows to a crawl. At worst, I get std::bad_alloc.

Putting aside what I intend to do with this array, what is the most efficient (memory-wise) way to store a rectangular array of 1's and 0's (or T/F, etc), where there are mostly 1's or 0's (and I know which is most populous)?.

Note that the array need be created "dynamically" (i.e. one element at a time), elements must maintain their location, and I need only to check the value of individual elements after creation. I'm concerned about memory footprint, nothing else.

Upvotes: 0

Views: 169

Answers (1)

Yakk - Adam Nevraumont
Yakk - Adam Nevraumont

Reputation: 275575

This is known as a sparse array or matrix.

std::set<std::pair<int,int>> bob;

If you want 7,100 to be 1, just bob.insert({7,100});. Missing elements are 0. You can use bob.count({3,7}) for a 0/1 value if you like.

Now looping over both columns are rows is tricky; easiest is to make 2 sets each backwards.

If you have no need to loop in order, use an unordered set instead.

Upvotes: 1

Related Questions