Pragyaditya Das
Pragyaditya Das

Reputation: 1696

Linear regression using Python (Pandas and Numpy)

I am trying to implement linear regression using python.

I did the following steps:

import pandas as p
import numpy as n
data = p.read_csv("...path\Housing.csv", usecols=[1]) # I want the first col
data1 = p.read_csv("...path\Housing.csv", usecols=[3]) # I want the 3rd col
x = data
y = data1

Then I try to obtain the co-efficients, and use the following:

regression_coeff = n.polyfit(x,y,1)

And then I get the following error:

raise TypeError("expected 1D vector for x")
TypeError: expected 1D vector for x

I am unable to get my head around this, as when I print x and y, I can very clearly see that they are both 1D vectors.

Can someone please help?

Dataset can be found here: DataSets

The original code is:

import pandas as p
import numpy as n

data = pd.read_csv('...\housing.csv', usecols = [1])
data1 = pd.read_csv('...\housing.csv', usecols = [3])

x = data
y = data1
regression = n.polyfit(x, y, 1)

Upvotes: 4

Views: 11285

Answers (3)

Alessandro
Alessandro

Reputation: 875

Python is telling you that the data is not in the right format, in particular x must be a 1D array, in your case it is a 2D-ish panda array. You can transform your data in a numpy array and squeeze it to fix your problem.

import pandas as pd
import numpy as np

data = pd.read_csv('../Housing.csv', usecols = [1])
data1 = pd.read_csv('../Housing.csv', usecols = [3])
data = np.squeeze(np.array(data))
data1 = np.squeeze(np.array(data1))

x = data
y = data1
regression = np.polyfit(x, y, 1)

Upvotes: 2

Stefan
Stefan

Reputation: 42905

pandas.read_csv() returns a DataFrame, which has two dimensions while np.polyfit wants a 1D vector for both x and y for a single fit. You can simply convert the output of read_csv() to a pd.Series to match the np.polyfit() input format using .squeeze():

data = pd.read_csv('../Housing.csv', usecols = [1]).squeeze()
data1 = p.read_csv("...path\Housing.csv", usecols=[3]).squeeze()

Upvotes: 2

Mike Müller
Mike Müller

Reputation: 85572

This should work:

np.polyfit(data.values.flatten(), data1.values.flatten(), 1)

data is a dataframe and its values are 2D:

>>> data.values.shape
(546, 1)

flatten() turns it into 1D array:

>> data.values.flatten().shape
(546,)

which is needed for polyfit().

Simpler alternative:

df = pd.read_csv("Housing.csv")
np.polyfit(df['price'], df['bedrooms'], 1)

Upvotes: 6

Related Questions