felipekare
felipekare

Reputation: 23

os.listdir return strange string of filename with special characters

Suppose I have the following files in path, which is in my Google drive that is connected to a Python 3 Colab notebook:

(Here, the # line represents the output)

ls = os.listdir(path)
print (ls)
# ['á.csv', 'b.csv']

Every seems ok, but if I write

'á.csv' in ls
# False

But should returns True. However, if I repeat the last code, but instead of writing 'á.csv' I copy-paste it manually from print (ls), it returns True.

Thanks

ps: The problem is not exactly with that filename, is with several filenames which contains special characters (namely í, á, é, ó, ñ)

Upvotes: 2

Views: 1353

Answers (2)

korakot
korakot

Reputation: 40838

You can normalize the file list before comparing them.

from unicodedata import normalize
ls = [normalize('NFC', f) for f in os.listdir(path)]
# compare
normalize('NFC', 'á.csv') in ls
# or just 'á.csv' in ls

Upvotes: 2

Hurried-Helpful
Hurried-Helpful

Reputation: 2000

I believe it is because some diacritic characters in Unicode have duplicates. That is, while some characters appear identical, they may be different characters with different codes. Try 'á'.encode() once by writing á and once again by copy-pasting as you did. If the bytes look different, that's because they are different characters that look identical.

Upvotes: 1

Related Questions