Reputation: 24705
I have wrote a code which tries to find the repetitions in a vector. In case of a repetition, it will add the position to a list. For example a sequence of 100 110 90 100 140 90 100
will be a 2D vector. The first dimension contains unique letters (or numbers) and a list of repetitions are appended as a second dimension. So the result looks like
100 -> [0 3 6]
110 -> [1]
90 -> [2 5]
140 -> [4]
The code is fairly simple
typedef unsigned long long int ulong;
typedef std::vector<ulong> aVector;
struct entry {
entry( ulong a, ulong t ) {
addr = a;
time.push_back(t);
}
ulong addr;
aVector time;
};
// vec contains original data
// theVec is the output vector
void compress( aVector &vec, std::vector< entry > &theVec )
{
aVector::iterator it = vec.begin();
aVector::iterator it_end = vec.end();
std::vector< entry >::iterator ait;
for (ulong i = 0; it != it_end; ++it, ++i) { // iterate over input vector
ulong addr = *it;
if (theVec.empty()) { // insert the first item
theVec.push_back( entry(addr, i) );
continue;
}
ait = find_if( theVec.begin(), theVec.end(), equal_addr(addr));
if (ait == theVec.end()) { // entry doesn't exist in compressed vector, so insert
theVec.push_back( entry(addr, i) );
} else { // write down the position of the repeated number (second dimension)
ait->time.push_back(i);
}
}
}
The find_if will look up like this
struct equal_addr : std::unary_function<entry,bool>
{
equal_addr(const ulong &anAddr) : theAddr(anAddr) {}
bool operator()(const entry &arg) const { return arg.addr == theAddr; }
const ulong &theAddr;
};
Problem is, for moderate input sizes (20M for my tests), the code is very slow and it may take one day to exit the compress function. Is there any chance for speedup by using std::list
instead of std::vec
? Since list
is performs better for sequential things. However I just want to know, if it can help or not. If it is useful then I have change some other codes.
Looking for any advice.
Upvotes: 0
Views: 244
Reputation: 247959
list
does not perform better for "sequential things". It performs dramatically worse for everything.The only real advantage it has is that elements in a list
are stable and pointers/iterators to elements won't be eliminated as the list is modified or grown.
Upvotes: 9