haroon khan
haroon khan

Reputation: 191

How to fix numpy TypeError: unsupported operand type(s) for -: 'str' and 'str'

I have been trying to implement the polynomial regression model in python on spyder IDE , everything works good and at the end when I try to add the arrange function from numpy it gives me the following error !!

import pandas as pd 
import matplotlib.pyplot as plt
import numpy as np

dataset = pd.read_csv("Position_Salaries.csv")
X = dataset.iloc[:, 1:2]
y = dataset.iloc[:, 2]

#fitting the linear regression model
from sklearn.linear_model import LinearRegression
lin_reg = LinearRegression()
lin_reg.fit(X,y)

#fitting the polynomial linear Regression
from sklearn.preprocessing import PolynomialFeatures
poly_reg = PolynomialFeatures(degree = 4)
X_poly = poly_reg.fit_transform(X)
lin_reg2 = LinearRegression()
lin_reg2.fit(X_poly,y)

#visualising the linear regression results
plt.scatter(X,y ,color = 'red')
plt.plot(X,lin_reg.predict(X), color='blue')
plt.title('linear regression model')
plt.xlabel('positive level')
plt.ylabel('salary')
plt.show()

#the code doesnt work here on this np.arrange linee !!!
#visualisng the polynomial results
X_grid = np.arange(min(X),max(X), 0.1)
X_grid = X_grid.reshape((len(X_grid), 1))
plt.scatter(X,y ,color = 'red')
plt.plot(X_grid,lin_reg2.predict( poly_reg.fit_transform(X_grid)), color='blue')
plt.title('linear regression model')
plt.xlabel('positive level')
plt.ylabel('salary')
plt.show()

it should run and execute without any error !

Error Traceback:-

TypeError                                 Traceback (most recent call last)

<ipython-input-24-428026f3698c> in <module>()
----> 1 x_grid = np.arange(min(x),max(x),0.1)
      2 print(x_grid, x)
      3 x_grid = x_grid.reshape((len(x_grid),1))
      4 
      5 plt.scatter(x, y, color = 'red')

TypeError: unsupported operand type(s) for -: 'str' and 'str'

Upvotes: 3

Views: 10442

Answers (8)

ElhamMotamedi
ElhamMotamedi

Reputation: 219

Try the following code:

X_grid = np.arange(float(min(X ['Level'])), float(max(X['Level'])), 0.01, dtype= float) 

Upvotes: 0

Gaurav kumar Sharma
Gaurav kumar Sharma

Reputation: 1

Check if you are getting the values from the dataset. Remember it is:

x = dataset.iloc[:, 1:-1].values
y = dataset.iloc[:, -1].values

not:

x = dataset.iloc[:, 1:-1]
y = dataset.iloc[:, -1]

Without ".values" you get strings ("str") that your error message is showing

Upvotes: 0

Ashwin Pradhan
Ashwin Pradhan

Reputation: 1

Replace,

X = dataset.iloc[:, 1:2] and y = dataset.iloc[:, 2]

With,

X = dataset.iloc[:, 1:2].values and y = dataset.iloc[:, 2].values

Upvotes: 0

Manoj kumawat
Manoj kumawat

Reputation: 95

Use this:

x = dataset.iloc[:, 1:2].values

y = dataset.iloc[:, -1:].values

Because you only have to accept numerical values in x and y.

Using dataset.iloc[].values means it will not include the Level and Salary name in x and y dataset.

Upvotes: 0

Debashis Sahoo
Debashis Sahoo

Reputation: 11

Try out this code. This worked for me as I am also doing the Udemy lecture.

X_grid = np.arange(min(X ['Level']), max(X['Level']), 0.01, dtype= float) 
X_grid = X_grid.reshape((len(X_grid), 1))

#plotting
plt.scatter(X,y, color = 'red')
plt.plot(X,lin_reg2.predict(poly_reg.fit_transform(X)), color = 'blue') ``
plt.title('Truth or Bluff (Polynomial Regression)')
plt.xlabel('Position Level')
plt.ylabel('Salary')

Upvotes: 1

hpaulj
hpaulj

Reputation: 231375

If this error occurs in:

np.arange(min(X),max(X), 0.1)

it must be because min(X) and max(X) are strings.

In [385]: np.arange('123','125')                                                                                
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-385-0a55b396a7c3> in <module>
----> 1 np.arange('123','125')

TypeError: unsupported operand type(s) for -: 'str' and 'str'

Since X is a pandas object (dataframe or series?) this isn't too surprising. pandas freely uses object dtype when it can't use a number (and doesn't use numpy string dtype):

X = dataset.iloc[:, 1:2]

np.arange(np.array('123'),np.array('125')) produces a different error, about 'U3' dtypes.

The fact that the LinearRegresion calls work with this X is a little puzzling, but I don't know how it sanitizes its inputs.

In any case, I'd check min(X) before the arange call, looking at its value and type. If it is a string, then explore the X in greater detail.


In a comment you say: there are two columns and all have integers from 1-10 and 45k to 100k. Is that '45k' an integer, or a string?


Let's do a test on a dummy dataframe:

In [392]: df = pd.DataFrame([[1,45000],[2,46000],[3,47000]], columns=('A','B'))                                 
In [393]: df                                                                                                    
Out[393]: 
   A      B
0  1  45000
1  2  46000
2  3  47000
In [394]: min(df)                                                                                               
Out[394]: 'A'
In [395]: max(df)                                                                                               
Out[395]: 'B'

min and max produce strings - derived from the column names.

In contrast the fit functions are probably working with the array values of the dataframe:

In [397]: df.to_numpy()                                                                                         
Out[397]: 
array([[    1, 45000],
       [    2, 46000],
       [    3, 47000]])

Don't assume things should work! Test, debug, print suspect values.


min/max are the python functions. The numpy ones operate in a dataframe sensitive way -

In [399]: np.min(df)      # delegates to df.min()                                                                                      
Out[399]: 
A        1
B    45000
dtype: int64
In [400]: np.max(df)                                                                                            
Out[400]: 
A        3
B    47000
dtype: int64

though those aren't appropriate inputs to arange either.

What exactly do you intend to produce with this arange call?

arange on the range of one column of the dataframe works:

In [405]: np.arange(np.min(df['A']), np.max(df['A']),.1)                                                        
Out[405]: 
array([1. , 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2. , 2.1, 2.2,
       2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9])

Upvotes: 2

Akaisteph7
Akaisteph7

Reputation: 6476

You should check what is in X and y. They are probably series objects containing strings. What you want is to extract the value in X and y and convert them to floats/ints before doing anything with them.

Something like:

X = dataset.iloc[:, 1:2].astype(float)
y = dataset.iloc[:, 2].astype(float)

Upvotes: 0

JChao
JChao

Reputation: 2319

you need to make sure whatever your inputs are have the correct type. It seems to me the types for the op are both str. Maybe try to transform them into floats by float(x) or some similar functions?

Upvotes: 0

Related Questions