Reputation: 191
I have been trying to implement the polynomial regression model in python on spyder IDE , everything works good and at the end when I try to add the arrange function from numpy it gives me the following error !!
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
dataset = pd.read_csv("Position_Salaries.csv")
X = dataset.iloc[:, 1:2]
y = dataset.iloc[:, 2]
#fitting the linear regression model
from sklearn.linear_model import LinearRegression
lin_reg = LinearRegression()
lin_reg.fit(X,y)
#fitting the polynomial linear Regression
from sklearn.preprocessing import PolynomialFeatures
poly_reg = PolynomialFeatures(degree = 4)
X_poly = poly_reg.fit_transform(X)
lin_reg2 = LinearRegression()
lin_reg2.fit(X_poly,y)
#visualising the linear regression results
plt.scatter(X,y ,color = 'red')
plt.plot(X,lin_reg.predict(X), color='blue')
plt.title('linear regression model')
plt.xlabel('positive level')
plt.ylabel('salary')
plt.show()
#the code doesnt work here on this np.arrange linee !!!
#visualisng the polynomial results
X_grid = np.arange(min(X),max(X), 0.1)
X_grid = X_grid.reshape((len(X_grid), 1))
plt.scatter(X,y ,color = 'red')
plt.plot(X_grid,lin_reg2.predict( poly_reg.fit_transform(X_grid)), color='blue')
plt.title('linear regression model')
plt.xlabel('positive level')
plt.ylabel('salary')
plt.show()
it should run and execute without any error !
Error Traceback:-
TypeError Traceback (most recent call last)
<ipython-input-24-428026f3698c> in <module>()
----> 1 x_grid = np.arange(min(x),max(x),0.1)
2 print(x_grid, x)
3 x_grid = x_grid.reshape((len(x_grid),1))
4
5 plt.scatter(x, y, color = 'red')
TypeError: unsupported operand type(s) for -: 'str' and 'str'
Upvotes: 3
Views: 10442
Reputation: 219
Try the following code:
X_grid = np.arange(float(min(X ['Level'])), float(max(X['Level'])), 0.01, dtype= float)
Upvotes: 0
Reputation: 1
Check if you are getting the values from the dataset. Remember it is:
x = dataset.iloc[:, 1:-1].values
y = dataset.iloc[:, -1].values
not:
x = dataset.iloc[:, 1:-1]
y = dataset.iloc[:, -1]
Without ".values"
you get strings ("str")
that your error message is showing
Upvotes: 0
Reputation: 1
Replace,
X = dataset.iloc[:, 1:2] and y = dataset.iloc[:, 2]
With,
X = dataset.iloc[:, 1:2].values and y = dataset.iloc[:, 2].values
Upvotes: 0
Reputation: 95
Use this:
x = dataset.iloc[:, 1:2].values
y = dataset.iloc[:, -1:].values
Because you only have to accept numerical values in x
and y
.
Using dataset.iloc[].values
means it will not include the Level
and Salary
name in x
and y
dataset.
Upvotes: 0
Reputation: 11
Try out this code. This worked for me as I am also doing the Udemy lecture.
X_grid = np.arange(min(X ['Level']), max(X['Level']), 0.01, dtype= float)
X_grid = X_grid.reshape((len(X_grid), 1))
#plotting
plt.scatter(X,y, color = 'red')
plt.plot(X,lin_reg2.predict(poly_reg.fit_transform(X)), color = 'blue') ``
plt.title('Truth or Bluff (Polynomial Regression)')
plt.xlabel('Position Level')
plt.ylabel('Salary')
Upvotes: 1
Reputation: 231375
If this error occurs in:
np.arange(min(X),max(X), 0.1)
it must be because min(X)
and max(X)
are strings.
In [385]: np.arange('123','125')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-385-0a55b396a7c3> in <module>
----> 1 np.arange('123','125')
TypeError: unsupported operand type(s) for -: 'str' and 'str'
Since X
is a pandas
object (dataframe or series?) this isn't too surprising. pandas
freely uses object dtype when it can't use a number (and doesn't use numpy string dtype):
X = dataset.iloc[:, 1:2]
np.arange(np.array('123'),np.array('125'))
produces a different error, about 'U3' dtypes.
The fact that the LinearRegresion calls work with this X
is a little puzzling, but I don't know how it sanitizes its inputs.
In any case, I'd check min(X)
before the arange
call, looking at its value and type. If it is a string, then explore the X
in greater detail.
In a comment you say: there are two columns and all have integers from 1-10 and 45k to 100k.
Is that '45k' an integer, or a string?
Let's do a test on a dummy dataframe:
In [392]: df = pd.DataFrame([[1,45000],[2,46000],[3,47000]], columns=('A','B'))
In [393]: df
Out[393]:
A B
0 1 45000
1 2 46000
2 3 47000
In [394]: min(df)
Out[394]: 'A'
In [395]: max(df)
Out[395]: 'B'
min
and max
produce strings - derived from the column names.
In contrast the fit
functions are probably working with the array values of the dataframe:
In [397]: df.to_numpy()
Out[397]:
array([[ 1, 45000],
[ 2, 46000],
[ 3, 47000]])
Don't assume things should work! Test, debug, print suspect values.
min/max
are the python functions. The numpy ones operate in a dataframe sensitive way -
In [399]: np.min(df) # delegates to df.min()
Out[399]:
A 1
B 45000
dtype: int64
In [400]: np.max(df)
Out[400]:
A 3
B 47000
dtype: int64
though those aren't appropriate inputs to arange
either.
What exactly do you intend to produce with this arange
call?
arange
on the range of one column of the dataframe works:
In [405]: np.arange(np.min(df['A']), np.max(df['A']),.1)
Out[405]:
array([1. , 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2. , 2.1, 2.2,
2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9])
Upvotes: 2
Reputation: 6476
You should check what is in X and y. They are probably series objects containing strings. What you want is to extract the value in X and y and convert them to floats/ints before doing anything with them.
Something like:
X = dataset.iloc[:, 1:2].astype(float)
y = dataset.iloc[:, 2].astype(float)
Upvotes: 0
Reputation: 2319
you need to make sure whatever your inputs are have the correct type. It seems to me the types for the op are both str
. Maybe try to transform them into floats by float(x)
or some similar functions?
Upvotes: 0