Reputation: 11560
Is there any built in way to do this?
rawstr = r"3 \u176? \u177? 0.2\u176? (2\u952?)"
#required str is 3 ° ± 0.2° (2θ).
something like
In [1] rawstr.unescape()?
Out[1]: '3° ± 0.2° 2θ'
The question is how to convert rawstr to 'utf-8'.
Please see my answer for more clarity.
Please answer if better option than what I am doing right now.
Upvotes: 4
Views: 791
Reputation: 11560
If you are on windows and pythonnet installed
import clr
clr.AddReference("System")
clr.AddReference("System.Windows.Forms")
import System.Windows.Forms as WinForms
def rtf_to_text(rtf_str):
"""Converts rtf to text"""
rtf = r"{\rtf1\ansi\ansicpg1252" + '\n' + rtf_str + '\n' + '}'
richTextBox = WinForms.RichTextBox()
richTextBox.Rtf = rtf
return richTextBox.Text
print(rtf_to_text(r'3 \u176? \u177? 0.2\u176? (2\u952?)'))
-->'3 ° ± 0.2° (2θ)'
Upvotes: 1
Reputation: 4500
Yep, there is!
For python 2:
print r'your string'.decode('string_escape')
For python 3, you need to transform it as bytes, and then use decode
:
print(rb'your string'.decode('unicode_escape'))
Note that this doesn't work in your case, since your symbols aren't escaped properly (even if you print them using the "normal" way, it doesn't work).
Your string should be like this:
rb'3\u00B0 \u00b1 0.2\u00B0 2\u03B8'
Note that if you need to transform a string
to bytes
in python, you can use the bytes
function.
my_str = r'3\u00B0 \u00b1 0.2\u00B0 2\u03B8'
my_bytes = bytes(my_str, 'utf-8')
print my_bytes.decode('string_escape') # python 2
print(my_bytes.decode('unicode_escape')) # python 3
Upvotes: 2