Reputation: 904
I'm trying to compare two excel documents to each other, they are made up of around 6000 rows and 4 columns, the first column is a domain name, the other three are comments, one of the documents has updated comments in some of the columns and eventually I would like this script to function like a batch update of new comments replacing the old outdated ones.
The code I have written so far opens the documents and adds them to two separate lists:
import csv
newlist = csv.reader(open('newcomments.csv','rU'), dialect='excel')
export = csv.reader(open('oldcomments.csv', 'rU'), dialect='excel')
for row in newlist:
olddomain=[]
domain = row[0:]
olddomain.append(domain)
for item in olddomain:
print item
for row in export:
newdomain=[]
domain= row[0:]
newdomain.append(domain)
for item in newdomain:
print item
the output from the lists looks like(the second column is normally blank):
['example.com', '', 'excomment', 'Parked Page']
When trying to compare the lists i have tried something like:
if item in olddomain != item in newdomain:
print "no match"
else:
print "match"
but that doesn't appear to work,for example, the first row in the two files contain the exact same data, but the code returns "no match", the second row in both files also contains the same data, but the code returns "match"
Is the problem with the way I am saving the rows to the list, or is there something else I'm missing? I'm going to assume there is a better way of doing this but I'm using it as an excuse to learn more python!
Thanks for your time.
Upvotes: 3
Views: 16015
Reputation: 153
Try making it a set and do and operation.
Example:
In [1]: a = ['a' , 'b', 'c']
In [2]: b = ['b' , 'a', 'c']
In [3]: set(a) & set(b)
Out[3]: {'a', 'b', 'c'}
In [4]: set(b) == set(a) & set(b)
Out[4]: True
Upvotes: 0
Reputation: 43077
It seems like you are trying to compare an old list of domain names to a new list of domain names. After those lists have been built, you want to see whether there is commonality between the lists.
In this case, I think a set()
offers much richer functionality that makes your life easier. Example:
>>> olddomains = set(['www.cisco.com', 'www.juniper.com', 'www.hp.com'])
>>> newdomains = set(['www.microsoft.com', 'www.cisco.com', 'www.apple.com'])
>>> olddomains.intersection(newdomains)
set(['www.cisco.com'])
>>>
>>> 'www.google.com' in newdomains
False
>>>
Rewriting part of your code to use a set would look like this:
# retain newlist, since that's the output from csv...
for row in newlist:
olddomain = set([])
domain = row[0]
olddomain.add(domain.lower()) # use lower() to ensure no CAPS mess things up
for item in olddomain:
print item
And the code you asked about:
if olddomain.intersection(newdomain) == set([]):
print "no match"
else:
print "match"
The general rule I use when determining whether I use a set()
or a list()
:
list()
set()
Since you're asking why the code I posted throws a TypeError
, if you are assigning row
the same way I am, then you need to use row[0]
instead of row[0:]
>>> row = ['example.com', '', 'excomment', 'Parked Page']
>>> row[0:]
['example.com', '', 'excomment', 'Parked Page']
>>> row[0]
'example.com'
>>>
I changed my example to reflect this, since I suspect that is where the issue lies.
Upvotes: 8
Reputation: 287755
You are most likely just missing parantheses. Note that the following two lines are equal, because the operator precedences of in
and !=
are equal:
if item in olddomain != item in newdomain:
if ((item in olddomain) != item) in newdomain:
You probably want:
if (item in olddomain) != (item in newdomain):
Upvotes: 3