user1174454
user1174454

Reputation: 21

convert a String '\u05d9\u05d7\u05e4\u05d9\u05dd' to its unicode character in python

I get a Json object from a URL which has values in the form like above: title:'\u05d9\u05d7\u05e4\u05d9\u05dd'

I need to print these values as readable text however I'm not able to convert them as they are taken as literal strings and not unicode objects.

doing unicode(myStr) does not work
doing a = u'%s' % myStr does not work

all are escaped as string so return the same sequence of characters. Does any one know how I can do this conversion in python?

May be the right approach is to change the encoding of the response, how do I do that?

Upvotes: 2

Views: 3862

Answers (3)

jfs
jfs

Reputation: 414875

json strings always use ", not ' so '\u05d9\u05d7\u05e4\u05d9\u05dd' is not a json string.

If you load a valid json text then all Python strings in it are Unicode so you don't need to decode anything. To display them you might need to encode them using a character encoding suitable for your terminal.

Example

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import json

d = json.loads(u'''{"title": "\u05d9\u05d7\u05e4\u05d9\u05dd"}''')
print d['title'].encode('utf-8') # -> יחפים

Note: it is a coincidence that the source encoding (specified in the first line) is equal to the output encoding (the last line) they are unrelated and can be different.

If you'd like to see less \uxxxx sequences in a json text then you could use ensure_ascii=False:

Example

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import json

L = ['יחפים']
json_text = json.dumps(L) # default encoding for input bytes is utf-8
print json_text # all non-ASCII characters are escaped
json_text = json.dumps(L, ensure_ascii=False)
print json_text # output as-is

Output

["\u05d9\u05d7\u05e4\u05d9\u05dd"]
["יחפים"]

Upvotes: 3

Andrew Clark
Andrew Clark

Reputation: 208705

If you have a string like this outside of your JSON object for some reason, you can decode the string using raw_unicode_escape to get the unicode string you want:

>>> '\u05d9\u05d7\u05e4\u05d9\u05dd'.decode('raw_unicode_escape')
u'\u05d9\u05d7\u05e4\u05d9\u05dd'
>>> print '\u05d9\u05d7\u05e4\u05d9\u05dd'.decode('raw_unicode_escape')
יחפים

Upvotes: -1

Ned Batchelder
Ned Batchelder

Reputation: 376052

You should use the json module to load the JSON data into a Python object. It will take care of this for you, and you'll have Unicode strings. Then you can encode them to match your output device, and print them.

Upvotes: 4

Related Questions