olzalk
olzalk

Reputation: 15

Unexpected result sklearn StandardScaler

I try to test some Scaler with following code. I expect a result like the blue distributions but scaled. What I get is the orange one. Can anybody help me?

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler


x1=np.random.normal(loc=21,scale=0.2,size=(100,1))
x2=np.random.normal(loc=1000,scale=550,size=(100,1))

data=np.concatenate((x1,x2),axis=1)

df=pd.DataFrame(data,columns=['x1','x2'])

fig1, axs=plt.subplots(nrows=1, ncols=2)
axs[0].hist(df['x1'])
axs[1].hist(df['x2'])

scaler = StandardScaler()
scaler.fit(df)
df_trans=scaler.transform(df)

fig2, axs=plt.subplots(nrows=1,ncols=2)
axs[0].hist(df_trans[0],color='orange')
axs[1].hist(df_trans[1],color='orange')

enter image description here enter image description here

Upvotes: 0

Views: 61

Answers (1)

Frightera
Frightera

Reputation: 5079

With df_trans[0] you don't select the entire column. You should change them as:

axs[0].hist(df_trans[:,0],color='orange') # all rows, first column
axs[1].hist(df_trans[:,1],color='orange') # all rows, second column

That will produce as follows:

enter image description here

Upvotes: 1

Related Questions