Reputation: 361
Suppose I have the following customer review and I want to know the sentiment about the hotel and the food:
"The room we got was nice but the food was average"
So had this been in a dataframe of reviews, the output from the analysis would have looked like:
Reviews Hotel Food
The room was ... Pos Neg
I have come across multiple tutorials on Kaggle and Medium which teach sentiment analysis, but they always look for the overall sentiment.
Please help me out, if you know the way, or are aware of any tutorials or know what terms to google to get around this problem. Thanks!
Edit: Please refer to these sides: http://sentic.net/sentire2011ott.pdf They seem to be lecture notes. Does anyone know a python implementation of the same? Thanks!
Edit: This question pertains to ABSA (Aspect Based Sentiment Analysis)
Upvotes: 1
Views: 1692
Reputation: 14425
Assuming the customer reviews are 1-to-N sentences in length, a portion of which reviewing multiple items (for example, room was great but the staff were rude), you might want to perform sentiment analysis on individual segments of text (separated by punctuations as well as conjunctions).
This will require a combination of pre-processing techniques that will segment the review text on sentences as well as conjunctions (such as but, so, etc.).
Sample code
First, sentence tokenization,
Assuming the review text was “Nice central hotel. Room was great but the staff were rude. Very easy to reach from the central station”
>>> from nltk.tokenize import sent_tokenize
>>> sentences = sent_tokenize(review_text)
>>> sentences
[“Nice central hotel.”,
“Room was great but the staff were rude.”,
“Very easy to reach from the central station.”]
Next, splitting on a few conjunctions,
import re
def split_conj(text):
return map(str.strip, re.sub('(but|yet)', "|", text).split('|'))
segments = []
for sent in sentences:
segments.extend(split_conj(sent))
Please note you would need to do some further preprocessing on the segments
which (based on the example review text) looks like
['Nice central hotel.',
'Room was great',
'the staff were rude.',
'Very easy to reach from the central station.']
Next, create your dataset linking the review ID to the individual segment IDs. So your dataframe columns are :
review ID | segment ID | segment text | label
# label could be a numerical value
# (range -1 to +1) instead of -1 and +1
Next, perform sentiment analysis on the individual segments and then combine them to get the overall sentiment for each review - based on the review ID linked with each segment ID.
A few choices for combining scores (non-exhaustive list):
I hope this helps you.
Upvotes: 2
Reputation: 1424
For that example, can't you just split the sentence on a list of "combining" words, like "and, but, moreover..." and then run a standard analysis on each part of the split?
Assuming/checking they still make up full sentences, as of course you may also have harder cases as in "Both the room and the food we got were fine", where you need to duplicate the end part of the sentence or get a part "both the room" which doesn't make sense anymore.
But this sentence only has one kind of sentiment anyway...
Upvotes: 0