Reputation: 15
We have two CSV files: a.csv
and b.csv
.
a.csv
has tree columns: label, item1, item2. b.csv
has two columns: item1, item2. If item1 and item2 in a.csv
also occurr in b.csv
, that's a.csv
and b.csv
have same item1 and item2, the value of label in a.csv
should be 1 instead.
For example:
a.csv
:
label item1 item2
0 123 35
0 342 721
0 876 243
b.csv
:
item1 item2
12 35
32 721
876 243
result.csv
:
label item1 item2
0 123 35
0 342 721
1 876 243
Upvotes: 0
Views: 1788
Reputation: 1121196
Read your a.csv
into a dictionary; use a tuple of (item1, item2)
as the key. Then when reading b.csv
you can update the label for each entry in the dictionary as you process the file.
After this process, write out result.csv
from the information in the dictionary.
import csv
rows = {}
with open('a.csv', 'r', newline='') as acsv:
areader = csv.DictReader(acsv)
for row in reader:
# store the row based on the item1 and item2 columns
key = (row['item1'], row['item2'])
rows[key] = row
with open('b.csv', 'r', newline='') as bcsv:
breader = csv.DictReader(bcsv)
for row in reader:
# set the label of matching rows to 1 when present
key = (row['item1'], row['item2'])
if key in rows:
rows[key]['label'] = 1
with open('result.csv', 'w', newline='') as result:
writer = csv.DictReader(result, fieldnames=areader.fieldnames)
writer.writerows(rows.values())
I used csv.DictReader()
objects to ease column name handling. Each row is presented as a dictionary, with the keys taken from the first row in the CSV file.
I also assumed you are using Python 3; if you are using Python 2, you'll have to adjust the open()
calls to remove the newline=''
argument, and you need to use binary mode ('rb'
and 'wb'
). I did not specify a codec for the files; currently the default system codec will be used to read and write. If that is incorrect, add encoding='...'
arguments.
Upvotes: 2