DevB2F
DevB2F

Reputation: 5085

len(str) giving wrong result after retrieving str from filename

Any idea why I am getting a length of 6 instead of 5? I created a file called björn-100.png and ran the code using python3:

import os

for f in os.listdir("."):
    p = f.find("-")
    name = f[:p]
    print("name")
    print(name)
    length = len(name)
    print(length)
    for a in name:
        print(a)

prints out the following:

name
björn
6
b
j
o
̈
r
n

instead of printing out

name
björn
5
b
j
ö
r
n

Upvotes: 1

Views: 438

Answers (2)

blhsing
blhsing

Reputation: 106543

If you're using python 2.7, you can simply decode the file name as UTF-8 first:

length = len(name.decode('utf-8'))

But since you're using python 3 and can't simply decode a string as if it were a bytearray, I recommend using unicodedata to normalize the string.

import unicodedata
length = len(unicodedata.normalize('NFC', name))

Upvotes: 3

DevB2F
DevB2F

Reputation: 5085

The way to get the correct string with the two dots inside the o char is:

import unicodedata

name = unicodedata.normalize('NFC', name)

Upvotes: 1

Related Questions