mobelahcen
mobelahcen

Reputation: 424

How to keep special characters intact in IPA with python

I want to generate a list of characters in a word without altering special characters like ɑ̃ e.g:

word =  "aplavɑ̃tʁɛ"
list(word)
['a', 'p', 'l', 'a', 'v', 'ɑ', '̃', 't', 'ʁ', 'ɛ']

I want to have: ['a', 'p', 'l', 'a', 'v', 'ɑ̃', 't', 'ʁ', 'ɛ']

Upvotes: 1

Views: 259

Answers (1)

tripleee
tripleee

Reputation: 189689

You want to check if a character is a combining character, and if so, keep it together with the preceding character.

import unicodedata


word =  "aplavɑ̃tʁɛ"
characters = []
for char in word:
    if unicodedata.combining(char):
        characters[-1] += char
    else:
        characters.append(char)
print(characters)

Result:

['a', 'p', 'l', 'a', 'v', 'ɑ̃', 't', 'ʁ', 'ɛ']

Unicode facilitates the combination of arbitrary glyphs with joining modifiers (diacritics etc) so that you can build up characters with multiple accents, which is frequently required in IPA and e.g. Vietnamese orthography. (This is sometimes abused in something called "zalgotext".)

Upvotes: 1

Related Questions