K. Allaei
K. Allaei

Reputation: 3

Using matching column values in two CSV files to create a new file with combined data

I am currently trying to compare two CSV files to check if IP addresses in the first column of file1.csv are in a row in file2.csv using Python 3.6. If the address is in file2, I need the second column value of that row copied to a new file that is identical to file 1. The two file setups look like this:

file 1:

XX.XXX.XXX.1,Test1
XX.XXX.XXX.2,Test2
XX.XXX.XXX.3,Test3
XX.XXX.XXX.4,Test4
XX.XXX.XXX.5,Test5
XX.XXX.XXX.6,Test6
XX.XXX.XXX.7,Test7
XX.XXX.XXX.8,Test8

and so on

file 2:

XX.XXX.XXX.6, Name6
XX.XXX.XXX.7, Name7
XX.XXX.XXX.8, Name8

I need the result.csv file to look like this:

XX.XXX.XXX.1,Test1, Not found
XX.XXX.XXX.2,Test2, Not found
XX.XXX.XXX.3,Test3, Not found
XX.XXX.XXX.4,Test4, Not found
XX.XXX.XXX.5,Test5, Not found
XX.XXX.XXX.6,Test6,Name6
XX.XXX.XXX.7,Test7,Name7
XX.XXX.XXX.8,Test8,Name8

The code I have so far is as follows:

import csv

f1 = open('file1.csv', 'r')
f2 = open('file2.csv', 'r')
f3 = open('results.csv', 'w')

c1 = csv.reader(f1)
c2 = csv.reader(f2)
c3 = csv.writer(f3)

file2 = list(c2)

for file1_row in c1:
    row = 1
    found = False
    for file2_row in file2:
        results_row = file1_row
        x = file2_row[3]
        if file1_row[1] == file2_row[1]:

        results_row.append('Found. Name: ' + x)
        found = True
        break
    row += 1
if not found:
    results_row.append('Not found in File1')
c3.writerow(results_row)

f1.close()
f2.close()
f3.close()

Right now this code is checking for identical rows not values. Meaning it will not match anything as it requires both the IP column and adjacent column to be equal on both files, in addition it matches both files' row 1, row 2, row3, and so on, but I need it to search through one to find the matches in the other, not compare rows by index.

Upvotes: 0

Views: 3144

Answers (4)

martineau
martineau

Reputation: 123473

Here's a non-pandas solution (assuming you're using Python 3.x):

import csv

present = {}
with open('file2.csv', 'r', newline='') as file2:
    reader = csv.reader(file2, skipinitialspace=True)
    for ip, name in reader:
        present[ip] = name

with open('file1.csv', 'r', newline='') as file1, \
     open('results.csv', 'w', newline='') as results:
    reader = csv.reader(file1, skipinitialspace=True)
    writer = csv.writer(results)
    for ip, name in reader:
        writer.writerow([ip, name, present.get(ip, ' Not found')])

File Results.csv:

XX.XXX.XXX.1,Test1, Not found
XX.XXX.XXX.2,Test2, Not found
XX.XXX.XXX.3,Test3, Not found
XX.XXX.XXX.4,Test4, Not found
XX.XXX.XXX.5,Test5, Not found
XX.XXX.XXX.6,Test6,Name6
XX.XXX.XXX.7,Test7,Name7
XX.XXX.XXX.8,Test8,Name8

Upvotes: 0

Mike Müller
Mike Müller

Reputation: 85482

A pandas solution:

import pandas as pd

df1 = pd.read_csv('file_1.csv', names=['a', 'b'])
df2 = pd.read_csv('file_2.csv', names=['a', 'b'])
merged = pd.merge(df1, df2, on='a', how='outer')
merged.to_csv('results.csv', header=False, index=False, na_rep='Not found')

​Content of results.csv:

XX.XXX.XXX.1,Test1,Not found
XX.XXX.XXX.2,Test2,Not found
XX.XXX.XXX.3,Test3,Not found
XX.XXX.XXX.4,Test4,Not found
XX.XXX.XXX.5,Test5,Not found
XX.XXX.XXX.6,Test6, Name6
XX.XXX.XXX.7,Test7, Name7
XX.XXX.XXX.8,Test8, Name8

Upvotes: 1

ettanany
ettanany

Reputation: 19806

A close solution to what you have tried would be as follows:

with open('result.csv', 'w') as out:
    with open('file1.csv', 'r') as f1, open('file2.csv', 'r') as f2:
        f2_lines = [line for line in f2.readlines() if len(line) > 1]
        f1_lines = [line for line in f1.readlines() if len(line) > 1]
        for line in f1_lines:
            val = 'Not found'
            b = [line.split(',')[0].strip() in item for item in f2_lines]
            if any(b):
                val = f2_lines[b.index(True)].split(',')[1].strip()
            out.write('{}, {}\n'.format(line.strip(), val))

Output:

XX.XXX.XXX.1,Test1, Not found
XX.XXX.XXX.2,Test2, Not found
XX.XXX.XXX.3,Test3, Not found
XX.XXX.XXX.4,Test4, Not found
XX.XXX.XXX.5,Test5, Not found
XX.XXX.XXX.6,Test6, Name6
XX.XXX.XXX.7,Test7, Name7
XX.XXX.XXX.8,Test8, Name8

Upvotes: 0

fean
fean

Reputation: 546

I moved the position of results_row and changed indent after row+=1

import csv

f1 = open('file1.csv', 'r')
f2 = open('file2.csv', 'r')
f3 = open('results.csv', 'w')

c1 = csv.reader(f1)
c2 = csv.reader(f2)
c3 = csv.writer(f3)

file2 = list(c2)

for file1_row in c1:
    row = 1
    found = False
    results_row = file1_row  #Moved out from nested loop
    for file2_row in file2:        
        x = file2_row[1]
        if file1_row[0] == file2_row[0]:
            results_row.append(x)
            found = True
            break
    row += 1
    if not found:
        results_row.append('Not found')     
    c3.writerow(results_row)

f1.close()
f2.close()
f3.close()

Upvotes: 0

Related Questions