Reputation: 67
I have 3 files. F1, F2, F3. F1 is the primary file with 200K entries. F2 and F3 could either contain a superset or a subset of entries (300K or 100K). My goal is to arrive at a list of entries in F1 that are not in F2 and F3. This is how I have implemented it so far.
Any smart, efficient ways to do this?
Upvotes: 1
Views: 237
Reputation: 60255
Since you say in comments that your inputs are already sequenced, just avoid containers entirely:
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main()
{
ifstream f1("f1.data"), f2("f2.data"), f3("f3.data");
string f1entry, f2entry, f3entry;
while ( getline(f1,f1entry) ) {
while ( f2 && f2entry < f1entry ) getline(f2,f2entry);
while ( f3 && f3entry < f1entry ) getline(f3,f3entry);
if ( f1entry != f2entry
&& f1entry != f3entry )
cout << f1entry << '\n';
}
}
Upvotes: 1
Reputation: 59997
Why not read in both F2 and F3 and put them in an unordered set.
Read F1 and spit out those items that are not found in this set.
Upvotes: 0
Reputation: 44238
I do not know where you got this conclusion:
there is absolutely no way my tree is going to be a balanced binary tree.
But it is wrong. You got strange ideas about how std::map work and try to optimize it premature according to that ideas. So just delete items from map and what is left after deletion of elements from F2 and F3 in that map is what you need. If standard map is not fast enough try hash map aka unordered_map.
PS and this should be set and unordered_set
Upvotes: 0