Reputation: 13
I've got a comma delimited text file with content that's kinda like this:
[email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Let's call it emails1.csv
. I've got another comma delimited text file too:
[email protected], [email protected]
Let's call it emails2.csv
. I need to subtract emails2.csv
from emails1.csv
using Python. In pseudocodenese:
emails1.csv = emails1.csv - emails2.csv
Total virgin to Python, but I made this based on a couple examples I found. Does it do what I think it does? That is, take the emails in emails2.csv
out of emails1.csv
and put the difference in a file called subtractomatic.csv
.
from sets import Set
import csv
fin = open('emails1.csv', 'rb')
reader = csv.reader(fin)
email_list1 = list(reader)[0]
fin = open('emails2.csv', 'rb')
reader = csv.reader(fin)
email_list2 = list(reader)[0]
email_list1 = list(set(email_list1)-set(email_list2))
fout = open('subtractomatic.csv', 'wb')
writer = csv.writer(fout, quoting=csv.QUOTE_NONE)
writer.writerow(email_list1)
fout.close()
fin.close()
fin.close()
I think it does because my original file, namely emails1.csv
, has X
emails in it, and when I open subtractomatic.csv
there are emails in it, and when I run
grep @ -o subtractomatic.csv | wc -l
in the terminal I get X/2
, which makes sense because emails1.csv
has twice as many emails as emails2.csv
---by design. I am, however, also a novice, so I don't know that I'm looking at this thing right.
Upvotes: 1
Views: 76
Reputation: 309929
Rather than the all set approach used by others, you can make B
a set
and filter out it's contents from A
:
b_set = set(B)
a_filtered = [a for a in A if a not in b_set]
This has the advantage of keeping the order of A
in a_filtered
(sans the elements you want to remove)...
Upvotes: 0
Reputation: 174706
Use sets to find the difference between two lists and then assign the results back to the list 1. The sets module provides classes for constructing and manipulating unordered collections of unique elements. Common uses include membership testing, removing duplicates from a sequence, and computing standard math operations on sets such as intersection, union, difference, and symmetric difference.
>>> l1 = ['[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]']
>>> l2 = ['[email protected]', '[email protected]']
>>> set(l1)-set(l2)
{'[email protected]', '[email protected]', '[email protected]', '[email protected]'}
>>> list(set(l1)-set(l2))
['[email protected]', '[email protected]', '[email protected]', '[email protected]']
>>> l1 = list(set(l1)-set(l2))
>>> l1
['[email protected]', '[email protected]', '[email protected]', '[email protected]']
Upvotes: 2