Christopher Long
Christopher Long

Reputation: 904

Comparing a list of strings to a list of strings (python)

I'm trying to compare two excel documents to each other, they are made up of around 6000 rows and 4 columns, the first column is a domain name, the other three are comments, one of the documents has updated comments in some of the columns and eventually I would like this script to function like a batch update of new comments replacing the old outdated ones.

The code I have written so far opens the documents and adds them to two separate lists:

import csv

newlist = csv.reader(open('newcomments.csv','rU'), dialect='excel')
export = csv.reader(open('oldcomments.csv', 'rU'), dialect='excel')

for row in newlist:
    olddomain=[]
    domain = row[0:]
    olddomain.append(domain)
    for item in olddomain:
        print item

    for row in export:
        newdomain=[]
        domain= row[0:]
        newdomain.append(domain)
        for item in newdomain:
            print item

the output from the lists looks like(the second column is normally blank):

['example.com', '', 'excomment', 'Parked Page']

When trying to compare the lists i have tried something like:

if item in olddomain != item in newdomain:
                    print "no match"
                else:
                    print "match"

but that doesn't appear to work,for example, the first row in the two files contain the exact same data, but the code returns "no match", the second row in both files also contains the same data, but the code returns "match"

Is the problem with the way I am saving the rows to the list, or is there something else I'm missing? I'm going to assume there is a better way of doing this but I'm using it as an excuse to learn more python!

Thanks for your time.

Upvotes: 3

Views: 16015

Answers (3)

user3256363
user3256363

Reputation: 153

Try making it a set and do and operation.

Example:

In [1]: a = ['a' , 'b', 'c']

In [2]: b = ['b' , 'a', 'c']

In [3]: set(a) & set(b)

Out[3]: {'a', 'b', 'c'}

In [4]: set(b) == set(a) & set(b)

Out[4]: True

Upvotes: 0

Mike Pennington
Mike Pennington

Reputation: 43077

It seems like you are trying to compare an old list of domain names to a new list of domain names. After those lists have been built, you want to see whether there is commonality between the lists.

In this case, I think a set() offers much richer functionality that makes your life easier. Example:

>>> olddomains = set(['www.cisco.com', 'www.juniper.com', 'www.hp.com'])
>>> newdomains = set(['www.microsoft.com', 'www.cisco.com', 'www.apple.com'])
>>> olddomains.intersection(newdomains)
set(['www.cisco.com'])
>>>
>>> 'www.google.com' in newdomains
False
>>>

Rewriting part of your code to use a set would look like this:

# retain newlist, since that's the output from csv...
for row in newlist:
    olddomain = set([])
    domain = row[0]
    olddomain.add(domain.lower())   # use lower() to ensure no CAPS mess things up
    for item in olddomain:
        print item

And the code you asked about:

if olddomain.intersection(newdomain) == set([]):
                    print "no match"
                else:
                    print "match"

The general rule I use when determining whether I use a set() or a list():

  • If retaining the ordering of the elements matters (to include being able to access them with an index), use a list()
  • In any other case, use a set()

EDIT

Since you're asking why the code I posted throws a TypeError, if you are assigning row the same way I am, then you need to use row[0] instead of row[0:]

>>> row = ['example.com', '', 'excomment', 'Parked Page']
>>> row[0:]
['example.com', '', 'excomment', 'Parked Page']
>>> row[0]
'example.com'
>>> 

I changed my example to reflect this, since I suspect that is where the issue lies.

Upvotes: 8

phihag
phihag

Reputation: 287755

You are most likely just missing parantheses. Note that the following two lines are equal, because the operator precedences of in and != are equal:

if   item in olddomain  != item in newdomain:
if ((item in olddomain) != item) in newdomain:

You probably want:

if (item in olddomain) != (item in newdomain):

Upvotes: 3

Related Questions