Reputation: 310
I've written a program to parse through a log file. The file in question is around a million entries or so, and my intent is to remove all duplicate entries by date. So if there's 100 unique log-ins on a date, it will only show one log-in per name. The log output I've created is in the form:
AA 01/Jan/2013
AA 01/Jan 2013
BB 01/Jan 2013
etc. etc. all through the month of January.
This is what I've written so far, the constant i in the for loop is the amount of entries to be sorted through and namearr & datearr are the arrays used for name and date. My end game is to have no repeated values in the first field that correspond to each date. I'm trying to follow proper etiquette and protocols so if I'm off base with this question I apologize.
My first thought in solving this myself is to nest a for loop to compare all previous names to the date, but since I'm learning about Data Structures and Algorithm Analysis, I don't want to creep up to high run times.
if(inFile.is_open())
{
for(int a=0;a<i;a++)
{
inFile>>name;//Take input file name
namearr[a]=name;//Store file name into array
//If names are duplicates, erase them
if(namearr[a]==temp)
{
inFile.ignore(1000,'\n');//If duplicate, skip to next line
}
else
{
temp=name;
inFile.ignore(1,' ');
inFile>>date;//Store date
datearr[a]=date;//Put date into array
inFile.ignore(1000,'\n');//Skip to next like
cout<<namearr[a]<<" "<<datearr[a]<<endl;//Output code to window
oFile<<namearr[a]<<" "<<datearr[a]<<endl;//Output code to file
}
}
}
Upvotes: 1
Views: 410
Reputation: 9235
You can construct a key composed of the name and the date with simple string concatenation. That string becomes the index to a map. As you are processing the file line by line, check to see if that string is already in the map. If it is, then you have encountered the name on that day once before. If you've seen it already do one thing, if it's new do another.
This is efficient because you're constructing a string that will only be found a second time if the name has already been seen on that date and maps efficiently search the space of keys to find if a key exists in the map or not.
Upvotes: 0
Reputation: 1651
Ughhh... You better use a Regular Expression library to easily deal with that size of a file. Check Boost Regex
http://www.boost.org/doc/libs/1_55_0/libs/regex/doc/html/index.html
Upvotes: 1