DotPi
DotPi

Reputation: 4137

Remove entries form CSV file based on a list in Python

I have a CSV file which has the following content:

Apple,Bat
Apple,Cat
Apple,Dry
Apple,East
Apple,Fun
Apple,Gravy
Apple,Hot
Bat,Cat
Bat,Dry
Bat,Fun
...

I also have a list as follows:

to_remove=[Fun,Gravy,...]

I would like an efficient way to delete all lines from the csv file which have any one of the words from the list to_remove.

I know one way to do it is to read each line of the csv file, loop through to_remove to see if any of the words are present in the line and save the line to another file if there was no match.

However, I have a lot of entries in both the csv file and the to_remove list (approx 21000 and 300 respectively). So I want a efficient way of doing it in Python.

I do not have access to clusters so map-reduce based options are not an option.

Upvotes: 0

Views: 164

Answers (1)

alvas
alvas

Reputation: 122280

toremove = ['Fun','Gravy']
with open('test.in','r') as fin, open('test.out','w') as fout:
    for i in filter(lambda x:not any(i for i in toremove if i in x), fin.readlines()):
        fout.write(i)

with open('test.out') as fout:
    print fout.read()

test.in:

Apple,Bat
Apple,Cat
Apple,Dry
Apple,East
Apple,Fun
Apple,Gravy
Apple,Hot
Bat,Cat
Bat,Dry
Bat,Fun

[out:]

Apple,Bat
Apple,Cat
Apple,Dry
Apple,East
Apple,Hot
Bat,Cat
Bat,Dry

Upvotes: 1

Related Questions