Shantanu
Shantanu

Reputation: 11

Text Mining basic questions

Text Mining specific questions:

  1. I am from a CRM domain in the automotive sector where we have a lot of unstructured survey data coming our way. First question is- Is there a domain specific dictionary that can be used here to map positive & negative words for sentiment analysis? If yes, please help me with that.

  2. How do we take care of words like "Not bad" and "Not good" while working on sentiment analysis. Not bad essentially means good but will be given a score of negative..

  3. How do we take care of the words lying in the vicinity of important words. eg. "was not helpful" and "very helpful" should be given negative & positive score as what is important here is the "not" & "very" which is surrounding the word "helpful". Some call this approach as "opinion mining". How does it happen in r & take care of such scenarios.

Anyone's help will be really appreciated.

Upvotes: 1

Views: 184

Answers (1)

nachiappanpl
nachiappanpl

Reputation: 783

  1. There may be few context based corpus which has good and bad keywords but those are not tailor made for all datasets. In some cases it may turn out to be very disappointing. For this type of problems, I would suggest you to take the path of Machine Learning. There are 'n' number of classification techniques that you can apply, I would suggest you to try out Naive Bayes, SVM and if you have time do text based CNN (Little bit tricky but very accurate). Once again all the above models will be heavily dependent on your training corpus. An example of which is available here

2 & 3. Try building a dependency tree, Stanford parser does a great job in grammatical analysis of sentences. For example, when I try to build a dependency tree out of

"I don't like buffet, instead I'll go for alacarte that's very economical"

enter image description here

From the above output we could get that, the sentiment word 'like' is modified by a negation 'not'. Also the word 'economical' is qualified by an adverb 'very'. There are more than 50 relations and you just have to worry about 4 or 5 for sentiment analysis. This link would be of help in explaining what those relations are. You can play with Stanford parser and use their APIs.

Upvotes: 0

Related Questions