Reputation: 23
Suppose I have the following files in path
, which is in my Google drive that is connected to a Python 3 Colab notebook:
(Here, the # line represents the output)
ls = os.listdir(path)
print (ls)
# ['á.csv', 'b.csv']
Every seems ok, but if I write
'á.csv' in ls
# False
But should returns True. However, if I repeat the last code, but instead of writing 'á.csv' I copy-paste it manually from print (ls)
, it returns True.
Thanks
ps: The problem is not exactly with that filename, is with several filenames which contains special characters (namely í, á, é, ó, ñ)
Upvotes: 2
Views: 1353
Reputation: 40838
You can normalize the file list before comparing them.
from unicodedata import normalize
ls = [normalize('NFC', f) for f in os.listdir(path)]
# compare
normalize('NFC', 'á.csv') in ls
# or just 'á.csv' in ls
Upvotes: 2
Reputation: 2000
I believe it is because some diacritic characters in Unicode have duplicates. That is, while some characters appear identical, they may be different characters with different codes. Try 'á'.encode()
once by writing á
and once again by copy-pasting as you did. If the bytes look different, that's because they are different characters that look identical.
Upvotes: 1