Reputation: 4137
I have a CSV file which has the following content:
Apple,Bat
Apple,Cat
Apple,Dry
Apple,East
Apple,Fun
Apple,Gravy
Apple,Hot
Bat,Cat
Bat,Dry
Bat,Fun
...
I also have a list as follows:
to_remove=[Fun,Gravy,...]
I would like an efficient way to delete all lines from the csv file which have any one of the words from the list to_remove.
I know one way to do it is to read each line of the csv file, loop through to_remove to see if any of the words are present in the line and save the line to another file if there was no match.
However, I have a lot of entries in both the csv file and the to_remove list (approx 21000 and 300 respectively). So I want a efficient way of doing it in Python.
I do not have access to clusters so map-reduce based options are not an option.
Upvotes: 0
Views: 164
Reputation: 122280
toremove = ['Fun','Gravy']
with open('test.in','r') as fin, open('test.out','w') as fout:
for i in filter(lambda x:not any(i for i in toremove if i in x), fin.readlines()):
fout.write(i)
with open('test.out') as fout:
print fout.read()
test.in
:
Apple,Bat
Apple,Cat
Apple,Dry
Apple,East
Apple,Fun
Apple,Gravy
Apple,Hot
Bat,Cat
Bat,Dry
Bat,Fun
[out:]
Apple,Bat
Apple,Cat
Apple,Dry
Apple,East
Apple,Hot
Bat,Cat
Bat,Dry
Upvotes: 1