Reputation: 18219
Data
I have N
different (sorted) vectors of indices (std::vector<unsigned int>
). The indices are in the range [0; L-1]. Here are two rules of thumbs about this data:
Hence a possible data set with N=10
vectors and with L = 200
could be
{45, 110, 119, 145, 170}
{9, 45, 110, 145, 178, 170}
{45, 145}
{45, 178, 183}
{45, 53, 110, 170}
{9, 119, 123, 179}
{9, 45, 119, 130, 131, 170, 190, 199}
{9, 45, 110, 170, 199}
{31, 45, 145}
{9, 178, 183}
Goal
I would like to compute the frequencies of every index. I would do something like
std::vector<double> computeFrequencies(std::vector<std::vector<unsigned int>>& data)
{
assert(data.size() == N);
std::vector<double> frequencies(L);
for (unsigned Ni = 0 ; Ni < N ; Ni++)
{
for (unsigned i = 0 ; i < data[Ni].size() ; i++)
{
assert(data[Ni][i] < L)
frequencies[data[Ni][i]]++;
}
}
for (unsigned i = 0 ; i < L; i++)
{
frequencies[i] /= (double) N;
}
return(frequencies);
}
I will then loop again through the object returned by the function computeFrequencies
only once.
for (unsigned i = 0 ; i < L; i++)
{
foo(frequencies[i]);
}
Question
The object frequencies
contains a lot fo zeros and I should hence be using a sparse vector instead. I don't have much understanding of sparse matrices though. What type of sparse vector should I use?
I am considering using boost::numeric::ublas::coordinate_matrix<double><double>
because as I loop through all N
vectors, I would constantly be adding new non-zeros values and I think a coordinate matrix would be good for dealing with that. Note that generally speaking, for this function, I am more worried about RAM usage than about computational time.
Upvotes: 0
Views: 312
Reputation: 59174
It doesn't look like a sparse vector representation is a good fit for your problem.
To accomplish your task as you describe it:
foo
them as you go.You can even do both steps at the same time, entirely avoiding the need to copy the data into a new structure.
Upvotes: 1