taga
taga

Reputation: 3885

Classifying comments into positive and negative using Scikit-Learn with Python

I have tried to write a code that will classify comments into positive and negative (0 for negative and 1 for positive).

I have a pandas dataframe with two columns, comments and results. I have used Logistic Regression in Python Scikit-Learn library (I will try other classifiers such as Decision Tree, SVM, KNN...) but it gives me an error (I want to do this without sentiment analysis). I think that the problem is because i input a string not a number. My program should take a comment (string value) and to evaluate it is it 0 or 1. This is the code:

import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn import linear_model



full_comment_data = pd.DataFrame({"Comment":["This is bad", "Good job", "I do not like this"],
                                  "Result":[0,1,0]})

features = full_comment_data["Comment"]
results = full_comment_data["Result"]

cv = CountVectorizer()  
features = cv.fit_transform(features)


logistic_regression = linear_model.LogisticRegression(solver="lbfgs")
model = logistic_regression.fit(features, results)

input_values = ["I love this comment"] #This value should be evaluated

prediction = logistic_regression.predict([input_values]) #adding values for prediction
prediction = prediction[0]
print(prediction)

This is the error that I get:

ValueError: X has 1 features per sample; expecting 5155

I have also tried this:

input_values = ["I love this comment"]

prediction = logistic_regression.predict(cv.fit_transform(input_values)) #adding values for prediction
prediction = prediction[0]

And I get this error:

ValueError: X has 3 features per sample; expecting ...

Upvotes: 2

Views: 1139

Answers (1)

vb_rises
vb_rises

Reputation: 1907

import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn import linear_model

full_comment_data = pd.DataFrame({"Comment":["This is bad", "Good job", "I do not like this"],
                                  "Result":[0,1,0]})

features = full_comment_data["Comment"]
results = full_comment_data["Result"]

cv = CountVectorizer()  
features = cv.fit_transform(features)


logistic_regression = linear_model.LogisticRegression(solver="lbfgs")
model = logistic_regression.fit(features, results)

input_values = ["I love this comment"] #This value should be evaluated

prediction = logistic_regression.predict(cv.transform(input_values)) #adding values for prediction
prediction = prediction[0]
print(prediction)

Output: 0

Upvotes: 5

Related Questions