user5526811
user5526811

Reputation:

Check if string is in array

i'm trying to check if some strings are in one array, like this:

intact_columns = [...]
for key, value in obj.iteritems():
    if not key in intact_columns:
       print key

The problem is: there are items like this in the array: Reten\xc3\xa7\xc3\xa3o (RET)

And the strings that i'm iterating over are like this: Retenção (RET)

How could i parse the strings inside the array to look like normal strings?

Upvotes: 3

Views: 1179

Answers (2)

Joran Beasley
Joran Beasley

Reputation: 113950

first you really need to understand the encoding ... at a guess the items in the array are utf8 ... the items you are checking apear to be unicode

if key.encode("utf8") in intact_columns:

note that i dont know what encoding is being used... (but utf8 is usually a pretty safe guess)

an aside about encodeing

bytestring.decode('utf8') # -> results in unicode
unicodestr.encode('utf8') # -> results in bytestring

in python3 you cannot encode/decode unless it is the appropriate type(unicode/bytestring). in python2 it will try and implicitly encode or decode for you if you hand it the wrong thing... which is where you are running into issues

Upvotes: 1

Moinuddin Quadri
Moinuddin Quadri

Reputation: 48067

The issue is because you are using the different encoding in both the strings. I am not the sure about the encoding type. It is safe to decode these to utf-8 (or utf-16) and then check. For example:

>>> my_list = ['Reten\xc3\xa7\xc3\xa3o (RET)', 'blah blah ...']
>>> my_string = 'Retenção (RET)'
>>> my_list[0].decode('utf-8')
u'Reten\xe7\xe3o (RET)'
>>> my_string.decode('utf-8')
u'Reten\xe7\xe3o (RET)'

Both holds the same decoded value

Upvotes: 1

Related Questions