LubieCiastka
LubieCiastka

Reputation: 143

Python: Write to file diacritical marks as escape character sequence

I read text line from input file and after cut i have strings:

-pokaż wszystko-
–ყველას გამოჩენა–

and I must write to other file somethink like this:

-poka\017C wszystko-
 \2013\10E7\10D5\10D4\10DA\10D0\10E1 \10D2\10D0\10DB\10DD\10E9\10D4\10DC\10D0\2013

My python script start that:

file_input = open('input.txt', 'r', encoding='utf-8')
file_output = open('output.txt', 'w', encoding='utf-8')

Unfortunately, writing to a file is not what it expects.

I got tip why I have to change it, but cant figure out conversion:

Diacritic marks saved in UTF-8 ("-pokaż wszystko-"), it works correctly only if NLS_LANG = AMERICAN_AMERICA.AL32UTF8

If the output file has diacritics saved in escaping form ("-poka\017C wszystko-"), the script works correctly for any NLS_LANG settings

Upvotes: 0

Views: 149

Answers (1)

Mark Tolonen
Mark Tolonen

Reputation: 177795

Python 3.6 solution...format characters outside the ASCII range:

#coding:utf8
s = ['-pokaż wszystko-','–ყველას გამოჩენა–']

def convert(s):
    return ''.join(x if ord(x) < 128 else f'\\{ord(x):04X}' for x in s)

for t in s:
    print(convert(t))

Output:

-poka\017C wszystko-
\2013\10E7\10D5\10D4\10DA\10D0\10E1 \10D2\10D0\10DB\10DD\10E9\10D4\10DC\10D0\2013

Note: I don't know if or how you want to handle Unicode characters outside the basic multilingual plane (BMP, > U+FFFF), but this code probably won't handle them. Need more information about your escape sequence requirements.

Upvotes: 1

Related Questions