Reputation: 8418
I have to use a sparse vector for some things in my current project. But, since I am not in charge of the project, I can not use whatever outside libraries I want. I only have STL and OpenCV available.
I have looked through several stackoverflow answered questions, but they either focus on a specific approach, comparison of a limited number of approaches (2) and outside libraries when they deal specifically with sparse vectors. There is also some excellent ideas for implementing a sparse matrix.
What I want is specifically a sparse vector (the indexes will always be in 1 dimension, the data is not relevant for this question). I would like something that will not be a project on it's own to implement, but could be used for more-than-demonstrative-purposes (e.g. I want to achieve a decent speed and not too much of a memory overhead) and hopefully re-used later. The options I have considered included:
std::map
to store the values (or maybe making a very simple wrapper that would return a default value in case of indexing a zero-element)std::vector< std::pair < int , data_type > >
where I could store the index and the data in the std::pair
elementsIs any of these solutions better/worse for the general purpose use as a sparse vector? I know every approach to everything has it's ups and downs, but argumented suggestions on which approach to choose would be very much appreciated. Also, recommending an approach I have not considered would be more than welcome if anyone thinks he has a better suggestion.
The usage in my specific case is as follows:
Still, as I have written in my original question, I would like suggestions for a general-purpose Sparse Vector implementation.
Upvotes: 6
Views: 4209
Reputation: 47513
I believe std::map
would give you the best results. SpareseMat
, I don't know, but among the two other methods you mentioned, std::map
will give you O(log(n))
lookup as well as O(log(n))
insertion and deletion. The vector
however requires a search over all its data (so it has O(n)
lookup). It has O(1)
insertion, but O(n)
removal. My guess is that you will have a lot of lookup, so most probably std::map
is better for you.
Depending on your application, you might want to use the vector
method during initial creation of the structure and then convert it to map
once you start using it to get the best of both worlds (but often this is not the case, for example in the cases where you have repeating indices).
Besides hash
which should give you O(1)
everything, but in reality might not, the lookup of O(log(n))
is the best you can hope for. You may come up with a vector that you can binary search on, or any other method based on searching for the data by comparison, but in the end they would all be O(log(n))
so you might as well use the easy-already-done std::map
.
Update: Based on the update on your question, that indicates that the vector will most likely not be modified after it is created and a most common operation is anticipated to be dot product, I would suggest the following:
First, use a vector of pairs as you have suggested yourself. During creation, simply push_back
and you get O(1)
performance.1 After that, you can sort the vector. The dot product would be quite simple2:
int dot = 0;
unsigned int index_v1 = 0, index_v2 = 0;
while (index_v1 < v1.size() && index_v2 < v2.size())
if (v1[index_v1].first == v2[index_v2].first)
dot += v1[index_v1++].second * v2[index_v2++].second;
else if (v1[index_v1].first < v2[index_v2].first)
++index_v1;
else
++index_v2;
Checking whether a certain element is a zero-element or not would be a simple binary search, checking whether the element could be found or not (O(log(n))
performance).
Given the fact that you would be using this structure as a point, I believe keeping it a vector would be better. You might want to later do cross product or other geometric operations.
Regarding the fact that you may need to insert something in the vector every now and then, then you would have to insert it in place (so the vector remains sorted). The performance would be O(n)
, but since it doesn't happen often, it shouldn't be an issue.
1 Unless you have millions of these vectors, O(1)
and O(log(n))
for n ~= 500
should not really make any noticeable difference.
2 You could definitely also use a map
and use iterators over it to do the dot product in order of index. The performance will be the same if std::map
uses a threaded tree that lets you get to the next node in O(1)
.
Upvotes: 6
Reputation: 299810
KISS
Start with the simplest solution, that is wrap std::map<size_t, YourData>
into a specific sparse vector structure:
template <typename V>
class SparseVector {
public:
private:
std::map<size_t, V> specificValues;
V defaultValue;
};
The map
will give good default performances in all use cases (find/insert/update) and the use of a custom class means that if you ever need to change to another implementation, you will not have to change the clients sources, just recompile.
Upvotes: 3