user18071300
user18071300

Reputation:

How to get string as Y output using Linear regression Python

I have this rating prediction model using linear regression

status = pd.DataFrame({'rating': [10.5,20.30,30.12,40.24,50.55,60.6,70.2], 'B': ['Bad','Not bad','Good','I like it','Very good','The best','Deserve an oscar']})

x = status.iloc[:,:-1].values
y = status.iloc[:,-1].values

from sklearn.model_selection import train_test_split

x_train,x_test,y_train,y_test = train_test_split(x,y,train_size=0.4,random_state=0)

from sklearn.linear_model import LinearRegression

lr = LinearRegression()
lr.fit(x,y)

input = 40.24
lr.predict([[input]])

So I have 40.24 as my input for X value I was expecting for 'I like it' as the output but it throws error instead because the expected output is a string, here's the error: ValueError: could not convert string to float: 'Bad'. How do I make it capable of having string as output?

Upvotes: 0

Views: 1073

Answers (3)

Edwin Cheong
Edwin Cheong

Reputation: 979

Hi thats because sckitlearn or rather machine learning labels require numbers as an input, i am not sure what the classes are in this case but you can use the onehotencoder from sckitlearn

Also do change it to logistic regression

from sklearn.preprocessing import OneHotEncoder
from sklearn.linear_model import LogisticRegression

# 1. INSTANTIATE
enc = OneHotEncoder()

# 2. FIT
enc.fit(y)

# 3. Transform
onehotlabels = enc.transform(y).toarray()
onehotlabels.shape

clf = LogisticRegression(random_state=0).fit(x, onehotlabels)

or you can just manually map it out which ever way you prefer (e.g Bad -> 0, Good -> 1)

Upvotes: 1

im_vutu
im_vutu

Reputation: 412

Your labels are categorical where regression labels should be continuous numerical. You can consider to see it as a classification problem rather than regression.

Upvotes: 0

Atul Mishra
Atul Mishra

Reputation: 86

You cannot do a Linear Regression if you have Target feature as a Categorical D-Type. That is the first rule of performing a Linear Regression that you should have Continuous Target feature as the y=mx+c function only takes in numbers as input and tests the function against the numerical items and predicts the numerical item.

That is why it gets trained but fails to predict.

You need to encode your target feature. Please self-study these concepts.

Hope this helps.

Upvotes: 0

Related Questions