Rahul
Rahul

Reputation: 11560

Python. Convert escaped utf string to utf-string

Is there any built in way to do this?

rawstr = r"3 \u176? \u177? 0.2\u176? (2\u952?)"
#required str is 3 ° ± 0.2° (2θ).

something like

In [1] rawstr.unescape()?
Out[1]: '3° ± 0.2° 2θ'

The question is how to convert rawstr to 'utf-8'.

Please see my answer for more clarity.

Please answer if better option than what I am doing right now.

Upvotes: 4

Views: 791

Answers (2)

Rahul
Rahul

Reputation: 11560

If you are on windows and pythonnet installed

import clr
clr.AddReference("System")
clr.AddReference("System.Windows.Forms")
import System.Windows.Forms as WinForms

def rtf_to_text(rtf_str):
    """Converts rtf to text"""

    rtf = r"{\rtf1\ansi\ansicpg1252" + '\n' + rtf_str + '\n' + '}'
    richTextBox = WinForms.RichTextBox()
    richTextBox.Rtf = rtf
    return richTextBox.Text

print(rtf_to_text(r'3 \u176? \u177? 0.2\u176? (2\u952?)'))
-->'3 ° ± 0.2° (2θ)'

Upvotes: 1

Mathieu Paturel
Mathieu Paturel

Reputation: 4500

Yep, there is!

For python 2:

print r'your string'.decode('string_escape')

For python 3, you need to transform it as bytes, and then use decode:

print(rb'your string'.decode('unicode_escape'))

Note that this doesn't work in your case, since your symbols aren't escaped properly (even if you print them using the "normal" way, it doesn't work).


Your string should be like this:

rb'3\u00B0 \u00b1 0.2\u00B0 2\u03B8'

Note that if you need to transform a string to bytes in python, you can use the bytes function.

my_str = r'3\u00B0 \u00b1 0.2\u00B0 2\u03B8'
my_bytes = bytes(my_str, 'utf-8')
print my_bytes.decode('string_escape') # python 2
print(my_bytes.decode('unicode_escape')) # python 3

Upvotes: 2

Related Questions