Jonny Nabors
Jonny Nabors

Reputation: 310

Deleting duplicate entries in a log file C++

I've written a program to parse through a log file. The file in question is around a million entries or so, and my intent is to remove all duplicate entries by date. So if there's 100 unique log-ins on a date, it will only show one log-in per name. The log output I've created is in the form:

AA 01/Jan/2013

AA 01/Jan 2013

BB 01/Jan 2013

etc. etc. all through the month of January.


This is what I've written so far, the constant i in the for loop is the amount of entries to be sorted through and namearr & datearr are the arrays used for name and date. My end game is to have no repeated values in the first field that correspond to each date. I'm trying to follow proper etiquette and protocols so if I'm off base with this question I apologize.

My first thought in solving this myself is to nest a for loop to compare all previous names to the date, but since I'm learning about Data Structures and Algorithm Analysis, I don't want to creep up to high run times.

if(inFile.is_open())
{
for(int a=0;a<i;a++)
{       

    inFile>>name;//Take input file name
    namearr[a]=name;//Store file name into array
    //If names are duplicates, erase them
    if(namearr[a]==temp)
    {
        inFile.ignore(1000,'\n');//If duplicate, skip to next line
    }
    else
    {           
    temp=name;
    inFile.ignore(1,' ');
    inFile>>date;//Store date
    datearr[a]=date;//Put date into array
    inFile.ignore(1000,'\n');//Skip to next like
    cout<<namearr[a]<<" "<<datearr[a]<<endl;//Output code to window
    oFile<<namearr[a]<<" "<<datearr[a]<<endl;//Output code to file
    }           
}
}

Upvotes: 1

Views: 410

Answers (2)

waTeim
waTeim

Reputation: 9235

You can construct a key composed of the name and the date with simple string concatenation. That string becomes the index to a map. As you are processing the file line by line, check to see if that string is already in the map. If it is, then you have encountered the name on that day once before. If you've seen it already do one thing, if it's new do another.

This is efficient because you're constructing a string that will only be found a second time if the name has already been seen on that date and maps efficiently search the space of keys to find if a key exists in the map or not.

Upvotes: 0

TravellingGeek
TravellingGeek

Reputation: 1651

Ughhh... You better use a Regular Expression library to easily deal with that size of a file. Check Boost Regex

http://www.boost.org/doc/libs/1_55_0/libs/regex/doc/html/index.html

Upvotes: 1

Related Questions