Reputation: 175
I'm writing a program that loops through a vector of documents (specific type, pointed by m_docs). Each doc has an attribute which is a vector of ~17000 zeros that are changed on some occasions (the point of the loop). I have ~3200 docs. My problem is that the first hundred docs are processed rather quickly, and then it really slows down. I would like to understand why it slows down, and to know how I could fix it (or at least optimize it)
Portion of code in question:
for (int k = 0; k < m_docs->size(); k++) {
int pos;
std::map<std::string, std::vector<std::pair<int, int> > >::iterator it = m_index.begin();
std::map<string,int> cleanList = (*m_docs)[k].getCleantList();
for (auto const& p : cleanList) {
pos = distance(it, m_index.find(p.first));
float weight = computeIdf(p.first) * computeTf(p.first, (*m_docs)[k]);
(*m_docs)[k].setCoord(pos, weight);
}
}
Upvotes: 1
Views: 785
Reputation: 393944
This could be more efficient:
std::map<string,int> cleanList
into
std::map<string,int> const& cleanList
Worst case, getCleantList
already made the copy, and you get a temp bound to a const& (which is fine). But way more likely, you decimate memory allocations because you're no longer copying maps containing strings
Also, look at the efficiency of the search here:
pos = distance(it, m_index.find(p.first));
You called the variable m_index
. You might need to improve locality (flat_map) or use a hash based container (unordered_map e.g.)
Review your data structures (at the very least for the m_index
)
Upvotes: 2