How to remove rows from csv based on matching data

Question

I have a large list of data in csv format which I need to remove rows from, based on matching two parameters.

My list of data to be removed would appear as follows:

London,James Smith
London,John Oliver
London,John-Smith-Harrison
Paris,Hermione
Paris,Trevor Wilson
New York City,Charlie Chaplin
New York City,Ned Stark
New York City,Thoma' Becket
New York City,Ryan-Dover

Then the main csv would remove a row based on matching the city name with the second column as well as matching the name with a name in the 9th column.

If both matched were achieved, delete the row in the main csv (note this csv hasn't been provided an example here).

Erik Kaplun · Accepted Answer

I verified the following to work as you need on the kind of data you provided/described:

import csv
from cStringIO import StringIO

# parse the data you're about to filter with
with open('filters.csv', 'rb') as f:
    filters = {(row[0], row[1]) for row in csv.reader(f, delimiter=',')}

out_f = StringIO()  # use e.g. `with open('out.csv', 'wb') as out_f` for real file output
out = csv.writer(out_f, delimiter=',')

# go thru your rows and see if the pair (row[1], row[8]) is
# found in the previously parsed set of filters; if yes, skip the row
with open('data.csv', 'rb') as f:
    for row in csv.reader(f, delimiter=','):
        if (row[1], row[8]) not in filters:
            out.writerow(row)

# for debugging only
print out_f.getvalue()  # prints the resulting filtered CSV data

NOTE: the {... for ... in ...} is set-comprehension syntax; depending on your Python version, you might need to change this to the equivalent set(... for ... in ...) for it to work.

How to remove rows from csv based on matching data

Answers (2)

Related Questions