Reputation: 1696
I am trying to implement linear regression using python.
I did the following steps:
import pandas as p
import numpy as n
data = p.read_csv("...path\Housing.csv", usecols=[1]) # I want the first col
data1 = p.read_csv("...path\Housing.csv", usecols=[3]) # I want the 3rd col
x = data
y = data1
Then I try to obtain the co-efficients, and use the following:
regression_coeff = n.polyfit(x,y,1)
And then I get the following error:
raise TypeError("expected 1D vector for x")
TypeError: expected 1D vector for x
I am unable to get my head around this, as when I print x
and y
, I can very clearly see that they are both 1D vectors.
Can someone please help?
Dataset can be found here: DataSets
The original code is:
import pandas as p
import numpy as n
data = pd.read_csv('...\housing.csv', usecols = [1])
data1 = pd.read_csv('...\housing.csv', usecols = [3])
x = data
y = data1
regression = n.polyfit(x, y, 1)
Upvotes: 4
Views: 11285
Reputation: 875
Python is telling you that the data is not in the right format, in particular x must be a 1D array, in your case it is a 2D-ish panda array. You can transform your data in a numpy array and squeeze it to fix your problem.
import pandas as pd
import numpy as np
data = pd.read_csv('../Housing.csv', usecols = [1])
data1 = pd.read_csv('../Housing.csv', usecols = [3])
data = np.squeeze(np.array(data))
data1 = np.squeeze(np.array(data1))
x = data
y = data1
regression = np.polyfit(x, y, 1)
Upvotes: 2
Reputation: 42905
pandas.read_csv()
returns a DataFrame
, which has two dimensions while np.polyfit
wants a 1D vector
for both x
and y
for a single fit. You can simply convert the output of read_csv()
to a pd.Series
to match the np.polyfit()
input format using .squeeze()
:
data = pd.read_csv('../Housing.csv', usecols = [1]).squeeze()
data1 = p.read_csv("...path\Housing.csv", usecols=[3]).squeeze()
Upvotes: 2
Reputation: 85572
This should work:
np.polyfit(data.values.flatten(), data1.values.flatten(), 1)
data
is a dataframe and its values are 2D:
>>> data.values.shape
(546, 1)
flatten()
turns it into 1D array:
>> data.values.flatten().shape
(546,)
which is needed for polyfit()
.
Simpler alternative:
df = pd.read_csv("Housing.csv")
np.polyfit(df['price'], df['bedrooms'], 1)
Upvotes: 6