Comparing content in two csv files

Question

So I have two csv files. Book1.csv has more data than similarities.csv so I want to pull out the rows in Book1.csv that do not occur in similarities.csv Here's what I have so far

    with open('Book1.csv', 'rb') as csvMasterForDiff:
        with open('similarities.csv', 'rb') as csvSlaveForDiff:
            masterReaderDiff = csv.reader(csvMasterForDiff)
            slaveReaderDiff = csv.reader(csvSlaveForDiff)        

            testNotInCount = 0
            testInCount = 0
            for row in masterReaderDiff:
                if row not in slaveReaderDiff:
                    testNotInCount = testNotInCount + 1
                else :
                    testInCount = testInCount + 1


print('Not in file: '+ str(testNotInCount))
print('Exists in file: '+ str(testInCount))

However, the results are

Not in file: 2093
Exists in file: 0

I know this is incorrect because at least the first 16 entries in Book1.csv do not exist in similarities.csv not all of them. What am I doing wrong?

Eugene Yarmash · Accepted Answer

A csv.reader object is an iterator, which means you can only iterate through it once. You should be using lists/sets for containment checking, e.g.:

slave_rows = set(slaveReaderDiff)

for row in masterReaderDiff:
    if row not in slave_rows:
        testNotInCount += 1
    else:
        testInCount += 1

Comparing content in two csv files

Answers (2)

Related Questions