Kate Fernando
Kate Fernando

Reputation: 381

How to read emoticons in a CSV file?

I am trying to read emoticons in my sentences and assign the sentiment value into it. I have found a list of emoticons with its sentiment value and copy it to the CSV file with the emoticons Unicode value and sentiment value as shown below.

enter image description here

When I am trying to check whether the sentence has emoticons as below, it works:

if "\U0001f914" in sentence:
    print("in")

But when I try to loop through the created CSV file (emoticons and sentiment) and check whether the emoticons exist in the sentence, it doesn't work.

Below is my code:

Method 1-

for line in lines_emoji:
    senti_emoji_unicode, senti_emoji_score = line.strip().split(',')

    if senti_emoji_unicode in sentence:
        print("in")

Method 2-

for line in lines_emoji:
    senti_emoji_unicode, senti_emoji_score = line.strip().split(',')
    senti_emoji_unicode = '"'+senti_emoji_unicode+'"'

    if senti_emoji_unicode in sentence:
        print("in")

Below is the updated full code as per the answers

file_name_emoji = os.path.dirname(os.path.abspath(__file__)) + '/emoji sentiment.csv'
fo_emoji = open(file_name_emoji, 'r', encoding='utf-8')
lines_emoji = fo_emoji.readlines()
fo_emoji.close()


for line in lines_emoji:
            
   senti_emoji_unicode, senti_emoji_score = line.strip().split(',')
            
   emoji = senti_emoji_unicode.encode('utf-8').decode('unicode_escape')
                
   score = float(senti_emoji_score)
            
   if emoji in sentence:       

       print('--------------------------------------')

I am getting the error as 'unicodeescape' codec can't decode bytes in position 0-8: truncated \UXXXXXXXX escape. I have seen many posts related to this issue, like adding 'r' and changing ''. But these fixes cannot be apply in my scenario since I am using dynamic list. I have tried below scenario to set this with 'r'. But same error appears.

raw_s = "r'{0}'".format(senti_emoji_unicode)
raw = senti_emoji_unicode.encode('utf-8').decode('unicode_escape')

Upvotes: 0

Views: 754

Answers (1)

Mark Tolonen
Mark Tolonen

Reputation: 177901

Your data file line is read and split into two multicharacter strings. The escape code is not evaluated and the decimal value is not a float. They must be converted.

Reproducible example:

lines = r'''
\U0001f602,0.221
\U00002764,0.746
'''.strip().splitlines()

for line in lines:
    print(line)

sentence = 'hello ā¤'

for line in lines:
    emoji_string,score_string = line.split(',')
    emoji = emoji_string.encode('ascii').decode('unicode_escape')
    score = float(score_string)
    print(emoji,score,emoji in sentence)

Output:

\U0001f602,0.221
\U00002764,0.746
šŸ˜‚ 0.221 False
ā¤ 0.746 True

Upvotes: 1

Related Questions