Reputation: 125
Is there a relatively straightforward way of removing diacritics from Greek strings? For example, if the string is "Ο πάνω όροφος" I want it to become "ο πανω οροφος", still in Greek, without the accents. I want to avoid string replace as it can be slow and most answers to similar questions use unidecode which converts the greek characters to english and I don't want that.
Upvotes: 2
Views: 2288
Reputation: 178030
Most official papers for anything need to have capitals only and without diacritics.
Does this work?
>>> import unicodedata as ud
>>> s="Ο πάνω όροφος"
>>> d = {ord('\N{COMBINING ACUTE ACCENT}'):None}
>>> ud.normalize('NFD',s).upper().translate(d)
'Ο ΠΑΝΩ ΟΡΟΦΟΣ'
Normalizing with NFD separates base code points from diacritics. The d
translation table lists Unicode ordinal translations...in this case, deleting the diacritic. I'm not familiar with Greek diacritic usage so the table may need to be expanded.
.replace('\u0301','')
could be used for one accent, but .translate()
is more efficient if there are multiple replacements.
Skip .upper()
to match your original question:
>>> ud.normalize('NFD',s).translate(d)
'Ο πανω οροφος'
Upvotes: 9