Reputation: 381
I am trying to read emoticons in my sentences and assign the sentiment value into it. I have found a list of emoticons with its sentiment value and copy it to the CSV file with the emoticons Unicode value and sentiment value as shown below.
When I am trying to check whether the sentence has emoticons as below, it works:
if "\U0001f914" in sentence:
print("in")
But when I try to loop through the created CSV file (emoticons and sentiment) and check whether the emoticons exist in the sentence, it doesn't work.
Below is my code:
Method 1-
for line in lines_emoji:
senti_emoji_unicode, senti_emoji_score = line.strip().split(',')
if senti_emoji_unicode in sentence:
print("in")
Method 2-
for line in lines_emoji:
senti_emoji_unicode, senti_emoji_score = line.strip().split(',')
senti_emoji_unicode = '"'+senti_emoji_unicode+'"'
if senti_emoji_unicode in sentence:
print("in")
Below is the updated full code as per the answers
file_name_emoji = os.path.dirname(os.path.abspath(__file__)) + '/emoji sentiment.csv'
fo_emoji = open(file_name_emoji, 'r', encoding='utf-8')
lines_emoji = fo_emoji.readlines()
fo_emoji.close()
for line in lines_emoji:
senti_emoji_unicode, senti_emoji_score = line.strip().split(',')
emoji = senti_emoji_unicode.encode('utf-8').decode('unicode_escape')
score = float(senti_emoji_score)
if emoji in sentence:
print('--------------------------------------')
I am getting the error as 'unicodeescape' codec can't decode bytes in position 0-8: truncated \UXXXXXXXX escape. I have seen many posts related to this issue, like adding 'r' and changing ''. But these fixes cannot be apply in my scenario since I am using dynamic list. I have tried below scenario to set this with 'r'. But same error appears.
raw_s = "r'{0}'".format(senti_emoji_unicode)
raw = senti_emoji_unicode.encode('utf-8').decode('unicode_escape')
Upvotes: 0
Views: 754
Reputation: 177901
Your data file line is read and split into two multicharacter strings. The escape code is not evaluated and the decimal value is not a float. They must be converted.
Reproducible example:
lines = r'''
\U0001f602,0.221
\U00002764,0.746
'''.strip().splitlines()
for line in lines:
print(line)
sentence = 'hello ā¤'
for line in lines:
emoji_string,score_string = line.split(',')
emoji = emoji_string.encode('ascii').decode('unicode_escape')
score = float(score_string)
print(emoji,score,emoji in sentence)
Output:
\U0001f602,0.221
\U00002764,0.746
š 0.221 False
ā¤ 0.746 True
Upvotes: 1