Reputation: 1005
I have a simple python script that gets the text of a tweet.
However, emojis are somehow encoded, so they look like this in the output \xf0\x9f\x90\xa3.
Is there a way to find out what emoji this is from this output?
Upvotes: 3
Views: 7798
Reputation: 155323
Odds are it's UTF-8 encoded (along with the rest of the data, it's just that ASCII text happens to be be rendered identically in ASCII and UTF-8).
If you have a bytes
like b'\xf0\x9f\x90\xa3'
, you'd just do:
b = b'\xf0\x9f\x90\xa3'
txt = b.decode('utf-8')
If you received it as a str
, this is probably a mistaken decoding as latin-1
or some other code page, so just undo it and redo with UTF-8:
b = '\xf0\x9f\x90\xa3'
txt = b.encode('latin-1').decode('utf-8')
# If it's not latin-1, could be sys.getdefaultencoding()
Which gets an ordinal of 0x1f423 (my computer can't display it, or I'd have added it here), which is in the correct range for most of the emoji. As noted in the comments, unicodedata
reports the character as a HATCHING CHICK
.
Upvotes: 1