Reputation: 357
Using MultinomialNB()
from Scikit learn in Python, I want to classify documents not only by word features in the documents but also in sentiment dictionary(meaning just word lists not Python data type).
Suppose these are documents to train
train_data = ['i hate who you welcome for','i adore him with all my heart','i can not forget his warmest welcome for me','please forget all these things! this house smells really weird','his experience helps a lot to complete all these tedious things', 'just ok', 'nothing+special today']
train_labels = ['Nega','Posi','Posi','Nega','Posi','Other','Other']
psentidict = ['welcome','adore','helps','complete','fantastic']
nsentidict = ['hate','weird','tedious','forget','abhor']
osentidict = ['ok','nothing+special']
I can train the lists like these below
from sklearn import naive_bayes
from sklearn.pipeline import Pipeline
text_clf = Pipeline([('vect', CountVectorizer()),
('clf', naive_bayes.MultinomialNB(alpha = 1.0)),])
text_clf = text_clf.fit(train_data, train_labels)
Even though I trained the data by calculation of all tokens according to corresponding labels, I want to use my sentiment dictionary as additional classifying features.
This is because with the features trained by the dictionaries, it is possible to predict OOV (out of vocabulary). Only with clumsy Laplace Smoothing(alpha = 1.0)
, overall accuracy would be severely limited.
test_data = 'it is fantastic'
predicted_labels = text_clf.predict(test_data)
With the dictionary feature added, it would be possible to predict a sentence above although every single token is out of training documents.
How to add features of psentidict
, nsentidict
, and osentidict
to Multinomial Naive Bayes classifier? (training them like documents can distort measurement so I think it is better to find another way)
Upvotes: 1
Views: 1164
Reputation: 8801
I believe there is no other way to include the features of your Multinomial Naive Bayes Model. This is simply because you want to associated some sort of label to the features ( say 'positive' for the values in psentidict and so on). That can only be achieved by training your model with the said pair of features and labels. What you can do is, improve the model, by creating sentences with the said features, rather than using the words directly, like for example, for the word 'hate', you could instead use ' I hate you with all my heart' and add the sentiment as 'negative', instead of only using the pair 'hate':'negative'. So, you have create more such examples for your dataset.
Hope this link helps.
Upvotes: 1