Reputation: 27
I have the following code, it will work for you without any problems, it works well in example Google Collab(https://colab.research.google.com/drive/1Rbx1iERTZI6Tahm4dBtt0P9jcAVDbxSa?usp=sharing) and gives the result (picture below). But the same code in jupyter notebook shows a score of 0 for all lines. the difference is insignificant, I use nltk.data.path[4]. But it shouldn't matter. What do you think is the problem?
!pip install feedparser
import feedparser
import pickle
import time
import requests
import nltk
import csv
nltk.download('stopwords')
nltk.download('punkt')
nltk.download('vader_lexicon')
posts = []
rss_url='https://www.pravda.com.ua/ukr/rss/view_news/'
response = feedparser.parse(rss_url)
for each in response['entries']:
if each['title'] in [x['title'] for x in posts]:
pass
else:
posts.append({
"title": each['title'],
"link": each['links'][0]['href'],
"tags": [x['term'] for x in each['tags']],
"date": time.strftime('%Y-%m-%d', each['published_parsed'])
})
for i, post in enumerate(p['title'] for p in posts):
print(i, post)
url = 'https://raw.githubusercontent.com/lang-uk/tone-dict-uk/master/tone-dict-uk.tsv'
r = requests.get(url)
with open(nltk.data.path[0]+'/tone-dict-uk.tsv', 'wb') as f:
f.write(r.content)
d = {}
with open(nltk.data.path[0]+'/tone-dict-uk.tsv', 'r') as csv_file:
for row in csv.reader(csv_file, delimiter='\t'):
d[row[0]] = float(row[1])
from nltk.sentiment.vader import SentimentIntensityAnalyzer
SIA = SentimentIntensityAnalyzer()
SIA.lexicon.update(d)
for i, post in enumerate(p['title'] for p in posts):
print(i, post, SIA.polarity_scores(post)["compound"])
Upvotes: 0
Views: 95
Reputation: 27
with open(nltk.data.path[4]+'/tone-dict-uk.tsv', 'r', encoding='utf-8') as csv_file:
for row in csv.reader(csv_file, delimiter='\t'):
d[row[0]] = float(row[1])
encoding='utf-8' - fixing bug
Upvotes: 0