J86
J86

Reputation: 15237

convert non-standard latin characters in a string to standard ones

I'm not sure if "standard" and "non-standard" are the right terms to use, apologies.

I basically have a bunch of names, such as:

Agit Işık
Ruşen Ünaydın
Candî Hissên

And I want them converted to:

agit-isik
rusen-unaydin
candi-hissen

I have created the following function that works most of the time, but not always:

import unicodedata

def get_name_slug(name):
    formatted_name = name.lower().replace(' ', '-')
    slug =  unicodedata.normalize('NFD', formatted_name).encode('ascii', 'ignore')

    return slug.decode('utf-8')

The result of the above function is:

agit-isk
rusen-unaydn
candi-hissen

Notice how Agit Işık and Ruşen Ünaydın failed to convert properly.

What am I missing?

Upvotes: 1

Views: 871

Answers (1)

rikyeah
rikyeah

Reputation: 2013

You can try fixing special cases by hand, something like:

def get_name_slug(name):
    formatted_name = name.lower().replace(' ', '-').replace('ı','i')
    slug =  unicodedata.normalize('NFD', formatted_name).encode('ascii', 'ignore')

    return slug.decode('utf-8')

Upvotes: 2

Related Questions