Reputation: 87
I ran this code but there seems to be an error on the lr.fit line. Does anyone know how to do this?
from sklearn.model_selection import cross_val_predict
from sklearn.model_selection import cross_val_score
from sklearn import linear_model
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv('2019.csv')
df1 = pd.DataFrame(df,columns=['GDP per capita', 'Social support'])
lr = LogisticRegression()
columns = ['GDP per capita', 'Social support']
X = df[columns]
y = df["Score"]
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.20,random_state=0)
lr.fit(X_train,y_train)
predictions = lr.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
print(accuracy)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-5-afa10dbaa367> in <module>
19 X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.30,random_state=0)
20
---> 21 lr.fit(X_train,y_train)
22 predictions = lr.predict(X_test)
23 accuracy = accuracy_score(y_test, predictions)
~/opt/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/_logistic.py in fit(self, X, y, sample_weight)
1526 X, y = check_X_y(X, y, accept_sparse='csr', dtype=_dtype, order="C",
1527 accept_large_sparse=solver != 'liblinear')
-> 1528 check_classification_targets(y)
1529 self.classes_ = np.unique(y)
1530 n_samples, n_features = X.shape
~/opt/anaconda3/lib/python3.7/site-packages/sklearn/utils/multiclass.py in check_classification_targets(y)
167 if y_type not in ['binary', 'multiclass', 'multiclass-multioutput',
168 'multilabel-indicator', 'multilabel-sequences']:
--> 169 raise ValueError("Unknown label type: %r" % y_type)
170
171
ValueError: Unknown label type: 'continuous'
on top is the full debug error, I only got this to work when I did .astype(int) beside the X and y. Otherwise If I did not do that, the error as you seen will occur.
Upvotes: 0
Views: 61
Reputation: 450
I went to Kaggle and searched and found 2019.csv having the two columns. The data has to do with people's happiness in countries throughout the world and what GDP per capita has to do with the "happiness score". Fine, works for me.
Anyway, I edited 2019.csv and kept the two data columns and score. I have column 1 = Score and it must have all zeros or ones (this is very important). I renamed GDP and SS for the other two columns and deleted all the other columns.
Score,GDP,SS - the columns in 2019.csv
This code produced the following output when run in PyCharm moments ago on my Macbook Pro:
the number is "accuracy"
0.46875
Process finished with exit code 0
so, not that great initially (almost 47% accurate), can be greatly improved easily...
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import pandas as pd
df = pd.read_csv('2019.csv')
df.head()
x = df.drop('Score', axis=1)
y = df.Score
lr = LogisticRegression()
columns = ['GDP', 'SS']
X = df[columns]
y = df["Score"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=0)
lr.fit(X_train, y_train)
predictions = lr.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
print(accuracy)
""" This was the output
0.46875
Process finished with exit code 0 """
Hope this helps.
Upvotes: 1