Reputation: 3885
I have tried to write a code that will classify comments into positive and negative (0 for negative and 1 for positive).
I have a pandas dataframe with two columns, comments
and results
. I have used Logistic Regression
in Python Scikit-Learn
library (I will try other classifiers such as Decision Tree, SVM, KNN...) but it gives me an error (I want to do this without sentiment analysis). I think that the problem is because i input a string not a number.
My program should take a comment (string value) and to evaluate it is it 0
or 1
.
This is the code:
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn import linear_model
full_comment_data = pd.DataFrame({"Comment":["This is bad", "Good job", "I do not like this"],
"Result":[0,1,0]})
features = full_comment_data["Comment"]
results = full_comment_data["Result"]
cv = CountVectorizer()
features = cv.fit_transform(features)
logistic_regression = linear_model.LogisticRegression(solver="lbfgs")
model = logistic_regression.fit(features, results)
input_values = ["I love this comment"] #This value should be evaluated
prediction = logistic_regression.predict([input_values]) #adding values for prediction
prediction = prediction[0]
print(prediction)
This is the error that I get:
ValueError: X has 1 features per sample; expecting 5155
I have also tried this:
input_values = ["I love this comment"]
prediction = logistic_regression.predict(cv.fit_transform(input_values)) #adding values for prediction
prediction = prediction[0]
And I get this error:
ValueError: X has 3 features per sample; expecting ...
Upvotes: 2
Views: 1139
Reputation: 1907
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn import linear_model
full_comment_data = pd.DataFrame({"Comment":["This is bad", "Good job", "I do not like this"],
"Result":[0,1,0]})
features = full_comment_data["Comment"]
results = full_comment_data["Result"]
cv = CountVectorizer()
features = cv.fit_transform(features)
logistic_regression = linear_model.LogisticRegression(solver="lbfgs")
model = logistic_regression.fit(features, results)
input_values = ["I love this comment"] #This value should be evaluated
prediction = logistic_regression.predict(cv.transform(input_values)) #adding values for prediction
prediction = prediction[0]
print(prediction)
Output: 0
Upvotes: 5