Reputation: 23
I need to test if a certain string (for example 'võiks') equals the name of any of the files contained in a directory.
>>>words = [ f.replace('.html', '') for f in listdir('lemma_pages/test') if isfile(join('lemma_pages/test',f)) ]
>>>words
['võibolla', 'võid', 'võiks', 'võimalik', 'võin', 'võta', 'võtan', 'võtta']
>>>'võiks' in words
False
But when I test for it, I get False when I expected otherwise. I am opening the file containing the words in this way:
open('et_500.txt', 'rt', encoding="utf-8")
Any idea of what I am not doing right ?
Upvotes: 0
Views: 54
Reputation: 76541
The data may not be normalized. Before comparing the strings, normalize with:
data = unicodedata.normalize('NFC', data)
To provide some more details, õ
could be U+00F5 (LATIN SMALL LETTER O WITH TILDE) or it could be U+0062 (LATIN SMALL LETTER B) followed by U+0303 (COMBINING TILDE). Normalizing is necessary so that no matter which flavor you get, they will compare identically.
Upvotes: 2