Comparing Two CSV in Python

Question

I am formatting CSV in Python to get the desired result but my code doesn't seem to work properly.

I have the first CSV file in format:

2,a
1,a
4,a
5,a
3,a
1,a
3,b
2,b
1,a

The second CSV file in format:

1,a,123
1,a,234
2,a,456
2,b,345
3,a,789
3,b,232
4,a,987

As the first CSV file is not sorted, the second CSV file is sorted in increasing order with respect to first column

I want the output in format:

2,a,456
1,a,123
4,a,987
5,a
3,a,789
1,a,234
3,b,232
2,b,345
1,a

The result is printed in respect to the first CSV, if the first CSV file combination is not found in the second CSV File, for example if 5,a is not in the second CSV file then just 5,a is printed in its respected position. The first CSV file contains many duplicates, while in the second CSV file each row is unique.

Here is my Code

for (num,alpha) in first_csv:
    value_found = True
    for (num1,alpha1,num2) in second_csv:
        if (num == num1 and alpha == alpha1):
            csv_out +=  str(num) + ',' + str(alpha) + ',' + str(number)
            value_found = False
    if value_found:
        count+=1
        if count == 1:
            csv_out += str(num) + ',' + str(alpha)

first_csv and second_csv are tuples I have created after reading the CSV files with the code:

with open('first_csv.csv') as f:
    f.readline()
    first_csv = tuple(csv.reader(f, delimiter=','))


with open('second_csv.csv') as f:
    f.readline()
    second_csv = tuple(csv.reader(f, delimiter=','))

But it is not printing the desired output, where am I doing wrong?

Janne Karila · Accepted Answer

This collects the num2 values from second file into a dictionary of deques. When matches are found, they are removed using deque.popleft to use each match only once, in the same order as they occurred in the second file.

from collections import defaultdict, deque

with open('second_csv.csv') as f:
    next(f) #skip header
    dic = defaultdict(deque)
    for num1,alpha1,num2 in csv.reader(f, delimiter=','):
        dic[num1, alpha1].append(num2)

with open('first_csv.csv') as f, open('out.csv', 'wb') as fout:
    next(f) #skip header
    csv_out = csv.writer(fout)
    for num,alpha in csv.reader(f, delimiter=','):
        try:
            num2 = dic[num,alpha].popleft()
            csv_out.writerow([num,alpha,num2])
        except IndexError:
            csv_out.writerow([num,alpha])

a deque retains the order of items that you add by append and remove by popleft.
a dict does fast lookup by key which can be a tuple
a defaultdict(deque) is a dict that creates an empty deque automatically when you access a missing key, so you can append to it directly.

Comparing Two CSV in Python

Answers (2)

Related Questions