Reputation: 138
I need to handle a data table (2D) of a maximum of 100 million rows and 30 columns. The table elements contain only 0s, 1s and dashes (-). The table is given to me in a text file. I have to store this data into some data structure and perform operations like sorting the table, deleting rows, comparing elements (both in different rows and columns), etc.
Currently I am using a 2D vector for this. I tried using an integer vector and a bit vector. Both work for 10mil rows and 25 columns but neither works for the above said maximum limit (I have a bad alloc).
I am assuming that this is because the vector needs contiguous memory. What would be the right data structure for this? Timing is also one of my considerations and would want a low access time as well. Also, if a vector is actually the right data structure, is there a better implementation I could use to make this work? I have to stick to C++ and can't use databases and stuff.
Thanks a lot!
Upvotes: 2
Views: 1282
Reputation: 20383
You can consider an array of boost::tribool
See
You can also consider a wrapper over std::bitset
such that for every tribool there are 2 bits in bitset.
tribool
of size N
requires bitset
of size 2*N
Mapping:
tribool(i) maps to bitset(2*i) and bitset(2*i+1)
if tribool(i) == indeterminate, then bitset(2*i) = false
if tribool(i) == true, then bitset(2*i) = true, bitset(2*i) = true
if tribool(i) == false, then bitset(2*i) = true, bitset(2*i) = false
C++ish pseudo-code:
template<size_t N>
class PackedTribool {
public:
??? test( i ) {
if( bits_[ 2 * i ] == false ) {
return indeterminate;
}
else {
return( bits_[ 2 * i + i ];
}
}
private:
std::bitset<2*N> bits_; // note: size is doubled
};
Upvotes: 4
Reputation: 33607
Check out Nstate.
It is a library for Packed Arrays for an arbitrary radix (so tristates qualify).
Disclaimer: I wrote nstate.
Upvotes: 6