Huan Ren
Huan Ren

Reputation: 15

Python: how to update a csv file from another csv file

We have two CSV files: a.csv and b.csv.

a.csv has tree columns: label, item1, item2. b.csv has two columns: item1, item2. If item1 and item2 in a.csv also occurr in b.csv, that's a.csv and b.csv have same item1 and item2, the value of label in a.csv should be 1 instead.

For example:

a.csv:

label    item1     item2
0         123       35
0         342       721
0         876       243

b.csv:

item1     item2
 12        35
 32        721
 876       243

result.csv:

label    item1     item2
0         123       35
0         342       721
1         876       243

Upvotes: 0

Views: 1788

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1121196

Read your a.csv into a dictionary; use a tuple of (item1, item2) as the key. Then when reading b.csv you can update the label for each entry in the dictionary as you process the file.

After this process, write out result.csv from the information in the dictionary.

import csv

rows = {}
with open('a.csv', 'r', newline='') as acsv:
    areader = csv.DictReader(acsv)
    for row in reader:
        # store the row based on the item1 and item2 columns
        key = (row['item1'], row['item2'])
        rows[key] = row

with open('b.csv', 'r', newline='') as bcsv:
    breader = csv.DictReader(bcsv)
    for row in reader:
        # set the label of matching rows to 1 when present
        key = (row['item1'], row['item2'])
        if key in rows:
            rows[key]['label'] = 1

with open('result.csv', 'w', newline='') as result:
    writer = csv.DictReader(result, fieldnames=areader.fieldnames)
    writer.writerows(rows.values())

I used csv.DictReader() objects to ease column name handling. Each row is presented as a dictionary, with the keys taken from the first row in the CSV file.

I also assumed you are using Python 3; if you are using Python 2, you'll have to adjust the open() calls to remove the newline='' argument, and you need to use binary mode ('rb' and 'wb'). I did not specify a codec for the files; currently the default system codec will be used to read and write. If that is incorrect, add encoding='...' arguments.

Upvotes: 2

Related Questions