Shuffle rows of a dataframe in pandas python brings about different regression results?

Question

I am trying to randomise my rows in the dataframe - data before applying linear regression, but i realised the regression results differs after the rows are randomised which shouldn't be the case? Codes which i have tried using:

Without row randomisation: 
data 
X = data[feature_col]
y = data['median_price']
lr = LinearRegression()
lr.fit(X, y)

With row randomisation: 
Method 1: 
data = data.sample(frac=1)

Method 2:
data = data.sample(frac=1, axis=1)

Method 3: 
from sklearn.utils import shuffle
data = shuffle(data)

Method 4: 
data = data.sample(frac=1, axis=1).reset_index(drop=True)

Out of the 4 row randomisation methods i have tried, only Method 4 gives the same results as the one where no randomisation is applied. I thought row randomisation does not affects the regression results in any case?

cosmic_inquiry · Accepted Answer

Methods 2 and 4 are identical?

Regression results should not differ if you are applying the same type of regression to the same data (randomized or not). You should be using axis = 0 to randomize rows of dataframes, axis = 1 randomizes the columns.

Shuffle rows of a dataframe in pandas python brings about different regression results?

Answers (1)

Related Questions