Reputation: 33
I have two CSV files, CSV1
and CSV2
, with multiple columns and rows. One of the columns is titled ID
.
What I want to do is check CSV1
's rows and if there is an ID
in CSV2
's ID
column that matches CSV1
's rows ID
, then I want to keep that row. If there's no match, then I want to delete the row from CSV1
.
Basically the data in CSV1
is relevant to me, but only the people that are in CSV2
. The unique connection between the two is the ID
column. So I need to check all the rows in CSV1
to see if that Row ID
is in a row in CSV2
.
Here's what I have so far.
import csv
smarteeCSV = open("Smartee.csv", "r")
aeriesCSV = open("aeriesEditable.csv", "r+")
aeries = csv.reader(aeriesCSV, delimiter=',')##CSV1
smartee = csv.reader(smarteeCSV, delimiter=',')##CSV2
for row in aeries:
for item in smartee
if row[1] != item[1]##indexes for the columns with the ids
I can already tell I'm not on the right track, so can anyone assist?
Upvotes: 1
Views: 1915
Reputation: 123473
Depending on what you plan on doing with the relevant rows of data, you might be able to use Python's built-in filter()
function to do what you need:
import csv
# first get the ids
with open('Smartee.csv', 'rb') as smarteeCSV: # CSV2
ids = set(row['ID'] for row in csv.DictReader(smarteeCSV, delimiter=','))
with open('aeriesEditable.csv', 'rb') as aeriesCSV: # CSV1
relevant = filter(lambda row: if row['ID'] in ids,
csv.DictReader(aeriesCSV, delimiter=','))
# relevant will be a list containing the desired rows from CSV1
If you want to iteratively process the rows, for the second part you could use a for
loop over the results of calling the itertools.ifilter()
function similarly instead.
Upvotes: 1
Reputation: 2826
You could extract all of the ID's in the second file and look those up every time you check one of the rows of the first file.
For example:
# extract ID column from CSV file 2 into a set
Ids = { row[1] for row in smartee }
# pick only rows whose ID is in Ids
filtered_rows = [item for item in aeries if item[1] in Ids]
Upvotes: 3
Reputation: 881853
First, read CSV2 to make just a set of IDs:
with open(CSV2) as f:
r = csv.DictReader(f)
theids = set(row['ID'] for row in r)
Then, while reading CSV1, just check if the ID is in the set:
with open(CSV1) as f, open(CSV1 + '.new', 'w') as out:
r = csv.DictReader(r)
w = csv.DictWriter(out, r.fieldnames)
for row in r:
if row['ID'] in theids:
w.writerow(row)
This assumes the CSV files are suitable for dict-based read/write (i.e first row is a list of column names), but is easy to tweak if the col names come from other information too.
Upvotes: 0