Apad
Apad

Reputation: 67

C++ Map : Smart algorithm needed

I have 3 files. F1, F2, F3. F1 is the primary file with 200K entries. F2 and F3 could either contain a superset or a subset of entries (300K or 100K). My goal is to arrive at a list of entries in F1 that are not in F2 and F3. This is how I have implemented it so far.

  1. Load up F1 entries in a C++ STL map.
  2. Start reading F2. If an entry matches, reduce the count (not erase from the map). Count = size of F1 to start with. If the count is 0, then I know that all entries in F1 are already found in F2 and so no need to traverse further in F2 or traverse through F3.
  3. The reason I am not "erasing" the entry from my map is that i read up that C++ STL map is a binary tree. And looking at my entries, there is absolutely no way my tree is going to be a balanced binary tree. It is an extremely deep tree. So any erase operation is turning out to be expensive. The find operation is also probably expensive, but the erase operation will have to recreate the tree upon every deletion.
  4. So now the problem is how do I arrive at the list of entries that exist in F2. Do I maintain a struct with a boolean flag "found = true or false"? Implying after am done with F2 and F3, I retraverse the entire STL map - and then look up values that have found = false and then start writing the delta into a file?

Any smart, efficient ways to do this?

Upvotes: 1

Views: 237

Answers (3)

jthill
jthill

Reputation: 60255

Since you say in comments that your inputs are already sequenced, just avoid containers entirely:

#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main()
{
    ifstream f1("f1.data"), f2("f2.data"), f3("f3.data");
    string f1entry, f2entry, f3entry; 

    while ( getline(f1,f1entry) ) {
        while ( f2 && f2entry < f1entry ) getline(f2,f2entry);
        while ( f3 && f3entry < f1entry ) getline(f3,f3entry);
        if ( f1entry != f2entry
          && f1entry != f3entry )
            cout << f1entry << '\n';

    }
}

Upvotes: 1

Ed Heal
Ed Heal

Reputation: 59997

Why not read in both F2 and F3 and put them in an unordered set.

Read F1 and spit out those items that are not found in this set.

Upvotes: 0

Slava
Slava

Reputation: 44238

I do not know where you got this conclusion:

there is absolutely no way my tree is going to be a balanced binary tree.

But it is wrong. You got strange ideas about how std::map work and try to optimize it premature according to that ideas. So just delete items from map and what is left after deletion of elements from F2 and F3 in that map is what you need. If standard map is not fast enough try hash map aka unordered_map.

PS and this should be set and unordered_set

Upvotes: 0

Related Questions