Roman Stasiuk
Roman Stasiuk

Reputation: 27

nltk code in google colab works and returns real values, but in jupyter notebook only zeros

I have the following code, it will work for you without any problems, it works well in example Google Collab(https://colab.research.google.com/drive/1Rbx1iERTZI6Tahm4dBtt0P9jcAVDbxSa?usp=sharing) and gives the result (picture below). But the same code in jupyter notebook shows a score of 0 for all lines. the difference is insignificant, I use nltk.data.path[4]. But it shouldn't matter. What do you think is the problem?

!pip install feedparser
import feedparser
import pickle
import time
import requests
import nltk
import csv

nltk.download('stopwords')
nltk.download('punkt')
nltk.download('vader_lexicon')

posts = []
rss_url='https://www.pravda.com.ua/ukr/rss/view_news/'
response = feedparser.parse(rss_url)
for each in response['entries']:
  if each['title'] in [x['title'] for x in posts]:
    pass
  else:
    posts.append({
        "title": each['title'],
        "link": each['links'][0]['href'],
        "tags": [x['term'] for x in each['tags']],
        "date": time.strftime('%Y-%m-%d', each['published_parsed'])
        })

  for i, post in enumerate(p['title'] for p in posts):
  print(i, post)

url = 'https://raw.githubusercontent.com/lang-uk/tone-dict-uk/master/tone-dict-uk.tsv'
r = requests.get(url)
with open(nltk.data.path[0]+'/tone-dict-uk.tsv', 'wb') as f:
    f.write(r.content)

d = {}
with open(nltk.data.path[0]+'/tone-dict-uk.tsv', 'r') as csv_file:
    for row in csv.reader(csv_file, delimiter='\t'):
        d[row[0]] = float(row[1])

from nltk.sentiment.vader import SentimentIntensityAnalyzer
SIA = SentimentIntensityAnalyzer()

SIA.lexicon.update(d)

for i, post in enumerate(p['title'] for p in posts):
  print(i, post, SIA.polarity_scores(post)["compound"])

enter image description here

Upvotes: 0

Views: 95

Answers (1)

Roman Stasiuk
Roman Stasiuk

Reputation: 27

with open(nltk.data.path[4]+'/tone-dict-uk.tsv', 'r', encoding='utf-8') as csv_file:
    for row in csv.reader(csv_file, delimiter='\t'):
        d[row[0]] = float(row[1])

encoding='utf-8' - fixing bug

Upvotes: 0

Related Questions