Reputation:
I am trying to find a relationship between UCAS points and Final university mark (Final) through linear regression, I am using This tutorial
I get the following error at
plt.scatter(X_test, Y_test, color='black')
could not convert string to float:
I have checked the type and "Total UCAS Points" column is of class str and "Final" is of type numpy.float64'
I have tried to convert the str to a float through doing the following:
pd.to_numeric("Total UCAS Points")
But keep getting the error message:
Unable to parse string "Total UCAS Points" at position 0
I have also tried to ignore the error but this does not seem to change the type to float and remains a str
here is a sample of my csv file:
Total UCAS Points: 280 280 240 240 360 360 360 360 630
Final: 58 46 62 64 48 56 54 30
df = df.replace(np.nan, -1)
X = df['Total UCAS Points']
Y = df['Final']
pd.to_numeric("Total UCAS Points")
print(type(Y[2]))
X=X.reshape(len(X),1)
Y=Y.reshape(len(Y),1)
# Split the data into training/testing sets
X_train = X[:-2500]
X_test = X[-2500:]
# Split the targets into training/testing sets
Y_train = Y[:-2500]
Y_test = Y[-2500:]
# Plot outputs
plt.scatter(X_test, Y_test, color='black')
Upvotes: 1
Views: 2646
Reputation: 405775
You need to pass a list of data to to_numeric
, not a column name from your data frame. Try this:
X = pd.to_numeric(X) # in place of pd.to_numeric("Total UCAS Points")
Upvotes: 3