Reputation: 23
The upper string is typed by me while the bottom string is pulled from a database.
bytes('TOYOTA', 'utf-8')
>> b'TOYOTA'
bytes('ΤΟΥΟΤΑ', 'utf-8')
>> b'\xce\xa4\xce\x9f\xce\xa5\xce\x9f\xce\xa4\xce\x91'
This causes undesirable results when I want to check for its existence
'TOYOTA' == 'ΤΟΥΟΤΑ'
>> False
Any idea how to "fix" the incorrect string?
Upvotes: 1
Views: 85
Reputation: 23117
It appears those are Greek capital letters:
>>> import unicodedata
>>> s = 'ΤΟΥΟΤΑ'
>>> for c in s:
... print(unicodedata.name(c))
...
GREEK CAPITAL LETTER TAU
GREEK CAPITAL LETTER OMICRON
GREEK CAPITAL LETTER UPSILON
GREEK CAPITAL LETTER OMICRON
GREEK CAPITAL LETTER TAU
GREEK CAPITAL LETTER ALPHA
You could try to use one of the available third-party libraries to do a transliteration to the Latin alphabet, for example:
This is a similar question: How can I create a string in english letters from another language word?
Upvotes: 2