Wessel Jespers
Wessel Jespers

Reputation: 35

Python CSV: Find identical data in two CSV files and copy corresponding data

Hi Stack Overflow users,

I'm quite new to Python and have already made a lot of progress from all questions and answer. Many thanks to everyone contributing!

I have been trying to solve an issue for the last couple of days, but despite much research I still haven't been able to figure it out.

Say I have two CSV files containing the following data:

'File1.csv':

sku; title
01001; Product1
01002; Product2
01003; Product3

and 'File2.csv':

sku; color
01001; blue
01003; green
01005; red

What I would like my python program to do, is to add the missing data (color) for all SKU's in File1.csv. The output should be as follows:

'output1.csv':

sku; title; color
01001; Product1; blue
01002; Product2;
01003; Product3; green

The Python code I wrote:

import csv

f1 = open('file1.csv', 'r', newline='', encoding='UTF-8')
f2 = open('file2.csv', 'r', newline='', encoding='UTF-8')
f1_reader = csv.reader(f1, delimiter=';')
f2_reader = csv.reader(f2, delimiter=';')

i = 0
j = 0

for row in f1_reader:
    for line in f2_reader:
        if str(row[0]) == str(line[0]):
            print(str(i), str(j), 'Success', str(row[0]), str(line[0]), str(line[1]))
            # Use values to copy the color value to file1.csv
        j = j + 1
    i = i + 1

Unfortunately this code doesn't work. The only output I get is as follows:

1 1 Success sku sku color

I guess my logic might be a bit off here, but I expected more 'Successes'. Once Again, I am just a beginner in Python, so I would love to learn what is wrong with my code!

Any advice?

UPDATE

I deleted the newline='' and the if-statement to see what the problem might be:

for row in f1_reader:
    for line in f2_reader:
        print('Success', str(row[0]), str(line[0]), str(line[1]))

It turns out I get the following output:

Success sku sku  color
Success sku 01001  blue
Success sku 01003  green 
Success sku 01005  red

IndexError: list index out of range

The f1_reader doesn't iterate over all of its lines, while the f2_reader does. Is there any way to make my code iterate over both my files, checking for identical row[0]'s and line[0]'s?

SOLUTION

The following code works for me:

import csv

f1 = open('file1.csv', 'r')
f1_reader = csv.reader(f1, delimiter=';')

for row in f1_reader:
    with open('file2.csv', 'r') as f2:
        f2_reader = csv.reader(f2, delimiter=';')
        for line in f2_reader:
            try:
                if str(row[0]) == str(line[0]):
                    print('Success', str(row[0]), str(line[0]), str(line[1]))
            except IndexError: 
                continue

Returns:

Success sku sku  color
Success 01001 01001  blue
Success 01003 01003  green
Success 01005 01005  red

Upvotes: 2

Views: 55

Answers (1)

CodeCupboard
CodeCupboard

Reputation: 1585

The issue you have is that once you have read through f2 once, you are at the end of the file. Therefore it will only compare f2 with the first loop. You need a way to read f2 from the start of the file when you start the loop again. I have edited you code below so that is loops as you except it to:

import csv

f1 = open('file1.csv', 'r')
f1_reader = csv.reader(f1, delimiter=';')


i = 0
j = 0

for row in f1_reader:
    f2 = open('file2.csv', 'r')
    f2_reader = csv.reader(f2, delimiter=';')
    for line in f2_reader:

        if row[0] == line[0]:
            print(str(i), str(j), 'Success', str(row[0]), str(line[0]), str(line[1]))
            # Use values to copy the color value to file1.csv


        j = j + 1
    f2.close
    i = i + 1

Returns

0 0 Success sku sku  color
1 5 Success 01001 01001  blue
3 14 Success 01003 01003  green

Upvotes: 2

Related Questions