Chris
Chris

Reputation: 7310

Comparing strings not working

I have a list of article titles that I store in a text file and load into a list. I'm trying to compare the current title with all the titles that are in that list like so

def duplicate(entry):
    for line in posted_titles:
        print 'Comparing'
        print entry.title
        print line
        if line.lower() == entry.title.lower()
            print 'found duplicate'
            return True
    return False

My problem is, this never returns true. When it prints out identical strings for entry.title and line, it won't flag them as equal. Is there a string compare method or something I should be using?

Edit After looking at the representation of the strings, repr(line) the strings that are being compared look like this:

u"Some Article Title About Things And Stuff - Publisher Name"
'Some Article Title About Things And Stuff - Publisher Name'

Upvotes: 0

Views: 264

Answers (2)

poke
poke

Reputation: 388313

It would help even more if you would have provided an actual example.

In any way, your problem is the different string encoding in Python 2. entry.title is apparently a unicode string (denoted by a u before the quotes), while line is a normal str (or vice-versa).

For all characters that are equally represented in both formats (ASCII characters and probably a few more), the equality comparison will be successful. For other characters it won’t:

>>> 'Ä' == u'Ä'
False

When doing the comparison in the reversed order, IDLE actually gives a warning here:

>>> u'Ä' == 'Ä'
Warning (from warnings module):
  File "__main__", line 1
UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
False

You can get a unicode string from a normal string by using str.decode and supplying the original encoding. For example latin1 in my IDLE:

>>> 'Ä'.decode('latin1')
u'\xc4'
>>> 'Ä'.decode('latin1') == u'Ä'
True

If you know it’s utf-8, you could also specify that. For example the following file saved with utf-8 will also print True:

# -*- coding: utf-8 -*-
print('Ä'.decode('utf-8') == u'Ä')

Upvotes: 1

kiriloff
kiriloff

Reputation: 26333

== is fine for string comparison. Make sure you are dealing with strings

if str(line).lower() == str(entry.title).lower()

other possible syntax is boolean expression str1 is str2.

Upvotes: 0

Related Questions