John
John

Reputation: 971

Using python to compare one file with another for missing entries

I have two files. I want to get a list of id's for NEW orders that are in Master.txt, but not in Subset.txt. Master.txt also contains existing orders (EXIST), which are not in Subset.txt, so its not a 1:1 comparison of files.

foundCount = 0
notFoundCount = 0
notFoundDict = []

for i, logLine in enumerate(open(master, "r").readlines()):
    if len(logLine ) > 1:
        if "NEW" in log_line:
            newItemDict = dict(item.split(":") for item in newItem.split(","))
            id = newItemDict ['id']

            for i, subsetLogLine in enumerate(open(subset, "r").readlines()):
                if id in subsetLogLine and "NEW" in subsetLogLine:
                    foundCount += 1
                    break
                else: 
                    notFoundCount += 1
                    notFoundDict.append(id)

Unfortunately what occurs is it gets unique id in the first line in Master.txt, matches that against a line in Subset.txt, but all the other lines don't have that id, so it adds all those id's to notFoundDict.

So i want it to search all of File B and append that id if not found in the whole file and break if it is found.

Master.txt
{"Type":"NEW","id":201753427,"time":"08:11:57.545","title":"string"}
{"Type":"NEW","id":201753195,"time":"08:11:58.616","title":"string"}
{"Type":"EXIST","id":201753195,"time":"08:11:59.639","title":"string"}
{"Type":"UPDATE","id":201753195,"time":"08:13:57.319","title":"string"}
{"Type":"UPDATE","id":201753195,"time":"08:15:51.119","title":"string"}
{"Type":"NEW","id":201753199,"time":"08:19:13.114","title":"string"}


Subset.txt
{NEWORDID="201753427" ORDTYPE="NEW" ORIGIN="LocationA" USERNAME="..." TIME="08:11:57.645"}
{NEWORDID="201753195" ORDTYPE="NEW" ORIGIN="LocationC" USERNAME="..." TIME="08:11:57.619"}
{NEWORDID="201753199" ORDTYPE="NEW" ORIGIN="LocationC" USERNAME="..." TIME="08:19:13.114"}

Upvotes: 1

Views: 159

Answers (1)

Jason
Jason

Reputation: 475

Have you considered a different approach?

Load all new order ids from file 1 into a set.

Load all new order ids from file 2 into a set.

Then find all the objects in the file 1 set that aren't in the file 2 set.

Seems like a simpler way to tackle your problem unless the files are unusually large.

Upvotes: 1

Related Questions