Reputation: 54
I have text file which like this:
And I'm trying to do sentiment analysis on each separate sentence, I'd like to write the results on another text file in this form:
First I'm trying to print them to see if it works but I keep running into errors and can't figure it out. This is the code I made that doesn't work:
def sentiment(f_name, pipeline):
x = open(f_name, encoding='utf-8')
text = x.read().splitlines()
for line in range(rn):
doc = pipeline(text[line])
print(line, doc.sentiment)
rn = 10 # number of lines to process, for tests
filename = input("Enter the name (with format) of the text you want to filter:\n")
lang = input("In what language is the text typed? ('ca' for catalan, 'es' for spanish, 'en' for english...)\n")
stanza.download(lang, verbose=False) # no need to check if it's downloaded every time, only the first time
nlp = stanza.Pipeline(lang=lang, verbose=False) # setting the pipeline, 'ca' for catalan
sentiment(filename, nlp)
And this is the traceback I get:
Traceback (most recent call last): File "C:\Users\svp12\PycharmProjects\practiques\main.py", line 233, in sentiment(filename, nlp) File "C:\Users\svp12\PycharmProjects\practiques\main.py", line 219, in sentiment print(line, doc.sentiment) AttributeError: 'Document' object has no attribute 'sentiment'
Upvotes: 0
Views: 452
Reputation: 557
The sentiment can be accessed on the sentences, not on the document itself. See here: https://stanfordnlp.github.io/stanza/sentiment.html
nlp = stanza.Pipeline(lang='en', processors='tokenize,sentiment')
doc = nlp('I hate that they banned Mox Opal')
for i, sentence in enumerate(doc.sentences):
print(i, sentence.sentiment)
I can see a couple issues with what you're doing. First is that there is no sentiment model available for Spanish or Catalan. I can do some investigation to see if there's an appropriate dataset for those languages, unless you happen to know one. Other issue is that there's no guarantee tweets will be one sentence per line, or that the tokenization model will treat them that way. You can get around this by turning off sentence splitting:
https://stanfordnlp.github.io/stanza/tokenize.html#tokenization-without-sentence-segmentation
nlp = stanza.Pipeline(lang='en', processors='tokenize,sentiment', tokenize_no_ssplit=True)
Edit: there is now a Spanish sentiment model based on TASS2020. I found a couple possible Catalan datasets, but one was tiny and aspect based, and the other only had positive or negative, so neither seemed particularly suitable.
Upvotes: 3