Reputation: 223
I have a large list of data in csv format which I need to remove rows from, based on matching two parameters.
My list of data to be removed would appear as follows:
London,James Smith
London,John Oliver
London,John-Smith-Harrison
Paris,Hermione
Paris,Trevor Wilson
New York City,Charlie Chaplin
New York City,Ned Stark
New York City,Thoma' Becket
New York City,Ryan-Dover
Then the main csv would remove a row based on matching the city name with the second column as well as matching the name with a name in the 9th column.
If both matched were achieved, delete the row in the main csv (note this csv hasn't been provided an example here).
Upvotes: 1
Views: 5141
Reputation: 38207
I verified the following to work as you need on the kind of data you provided/described:
import csv
from cStringIO import StringIO
# parse the data you're about to filter with
with open('filters.csv', 'rb') as f:
filters = {(row[0], row[1]) for row in csv.reader(f, delimiter=',')}
out_f = StringIO() # use e.g. `with open('out.csv', 'wb') as out_f` for real file output
out = csv.writer(out_f, delimiter=',')
# go thru your rows and see if the pair (row[1], row[8]) is
# found in the previously parsed set of filters; if yes, skip the row
with open('data.csv', 'rb') as f:
for row in csv.reader(f, delimiter=','):
if (row[1], row[8]) not in filters:
out.writerow(row)
# for debugging only
print out_f.getvalue() # prints the resulting filtered CSV data
NOTE: the {... for ... in ...}
is set-comprehension syntax; depending on your Python version, you might need to change this to the equivalent set(... for ... in ...)
for it to work.
Upvotes: 5
Reputation: 26335
You can read your data line by line and append line to list if its elements in 2nd and 9th columns are not in lists L1 and L2 respectively.
ext = "C:\Users\Me\Desktop\\test.txt"
readL = []
f = open(ext)
for line in f:
listLine = line.strip().split(',')
if(listLine[2] in L1 or listLine[9] in L2):
continue
readL += [listLine]
f.close()
Upvotes: 1