fyngyrz
fyngyrz

Reputation: 2658

Unicode string to backslash-encoded non-unicode 7-bit ASCII in python?

Environment: Python 2.6 ... Python 2.higher-than-6

I have correct u'' UTF-8 strings that I need to change into ASCII coded format in standard Python 2.6-ish ASCII strings. Like so:

def conversionSolution(utf8StringInput):
{
    ...
    return(asciiStringResult)
}

utf8string = u'\u5f00\u80c3\u83dc'
asciistring = conversionSolution(utf8string)
print asciistring

With ... filled in, the above would print out...

\u5f00\u80c3\u83dc

and not...

开胃菜

Let me emphasize that I do not want the UTF-8 here; I specifically require 0-127 encoded ASCII backslash data that I can subsequently manipulate strictly as 7-bit ASCII.

Upvotes: 0

Views: 123

Answers (2)

Robᵩ
Robᵩ

Reputation: 168706

def conversionSolution(utf8StringInput):
    return repr(utf8StringInput)[2:][:-1]

utf8string = u'\u5f00\u80c3\u83dc'
asciistring = conversionSolution(utf8string)
print asciistring

Upvotes: 1

roeland
roeland

Reputation: 5751

You could call .encode('unicode-escape') to do this.

That being said, you're talking about manipulating that string afterwards. There is not much useful you can do with that string afterwards. Eg. if you slice it you may slice in the middle of these escape sequences. Case folding of course doesn't work, etc. If you need to manipulate that string you should keep it as an unicode string.

Upvotes: 1

Related Questions