Reputation:
i'm trying to check if some strings are in one array, like this:
intact_columns = [...]
for key, value in obj.iteritems():
if not key in intact_columns:
print key
The problem is: there are items like this in the array: Reten\xc3\xa7\xc3\xa3o (RET)
And the strings that i'm iterating over are like this: Retenção (RET)
How could i parse the strings inside the array to look like normal strings?
Upvotes: 3
Views: 1179
Reputation: 113950
first you really need to understand the encoding ... at a guess the items in the array are utf8 ... the items you are checking apear to be unicode
if key.encode("utf8") in intact_columns:
note that i dont know what encoding is being used... (but utf8 is usually a pretty safe guess)
an aside about encodeing
bytestring.decode('utf8') # -> results in unicode
unicodestr.encode('utf8') # -> results in bytestring
in python3 you cannot encode/decode unless it is the appropriate type(unicode/bytestring). in python2 it will try and implicitly encode or decode for you if you hand it the wrong thing... which is where you are running into issues
Upvotes: 1
Reputation: 48067
The issue is because you are using the different encoding in both the strings. I am not the sure about the encoding type. It is safe to decode these to utf-8
(or utf-16) and then check. For example:
>>> my_list = ['Reten\xc3\xa7\xc3\xa3o (RET)', 'blah blah ...']
>>> my_string = 'Retenção (RET)'
>>> my_list[0].decode('utf-8')
u'Reten\xe7\xe3o (RET)'
>>> my_string.decode('utf-8')
u'Reten\xe7\xe3o (RET)'
Both holds the same decoded value
Upvotes: 1