UGeorge
UGeorge

Reputation: 101

Convert strings to emoji in python

I have a collection of twits and i want to check emojis in them, but it looks like the writing procedure for the collection converted all emojis in string for example 'šŸ˜Š' is ':-)' in text and 'šŸ˜ƒ' is ':D' and so on with all emojis. If we try to check unicode codepoints for them we get ':-)'.encode('utf-8') equals to b':-)' in the same time 'šŸ˜Š'.encode('utf-8') equals to 'b'\xf0\x9f\x98\x8a and equality check fails. Using utf-16 : ':-)'.encode('utf-16') equals to b'\xff\xfe:\x00-\x00)\x00' and 'šŸ˜Š'.encode('utf-16') is b'\xff\xfe=\xd8\n\xde' . So is there any way to convert text representations such as ':-)' back to emoji 'šŸ˜Š'.

Upvotes: 3

Views: 4378

Answers (1)

JosefZ
JosefZ

Reputation: 30103

Use a dictionary to convert any text emoticon back to emoji e.g. as follows:

>>> dict_emo = { ':-)'  : b'\xf0\x9f\x98\x8a',
...              ':)'   : b'\xf0\x9f\x98\x8a',
...              '=)'   : b'\xf0\x9f\x98\x8a',  # Smile or happy
...              ':-D'  : b'\xf0\x9f\x98\x83',
...              ':D'   : b'\xf0\x9f\x98\x83',
...              '=D'   : b'\xf0\x9f\x98\x83',  # Big smile
...              '>:-(' : b'\xF0\x9F\x98\xA0',
...              '>:-o' : b'\xF0\x9F\x98\xA0'   # Angry face
...              }
>>> print( dict_emo[':)'].decode('utf-8'))
šŸ˜Š
>>> print( dict_emo['>:-('].decode('utf-8'))
šŸ˜ 
>>> print( dict_emo[':-D'].decode('utf-8'))
šŸ˜ƒ
>>>
>>>
>>> dict_emot= { ':-)'  : b'\xf0\x9f\x98\x8a'.decode('utf-8'),
...              ':)'   : b'\xf0\x9f\x98\x8a'.decode('utf-8'),
...              '=)'   : b'\xf0\x9f\x98\x8a'.decode('utf-8'),  # Smile or happy
...              ':-D'  : b'\xf0\x9f\x98\x83'.decode('utf-8'),
...              ':D'   : b'\xf0\x9f\x98\x83'.decode('utf-8'),
...              '=D'   : b'\xf0\x9f\x98\x83'.decode('utf-8'),  # Big smile
...              '>:-(' : b'\xF0\x9F\x98\xA0'.decode('utf-8'),
...              '>:-o' : b'\xF0\x9F\x98\xA0'.decode('utf-8')   # Angry face
...              }
>>> print( dict_emot[':)'] )
šŸ˜Š
>>> print( dict_emot['>:-o'] )
šŸ˜ 
>>> print( dict_emot['=D'] )
šŸ˜ƒ
>>>

Unfortunately, there are at least two tasks remaining:

Upvotes: 5

Related Questions