SwimMaster
SwimMaster

Reputation: 381

How to normalize all columns of pandas data frame but first/key

The data looks something like this

| 2019-08-13 00:30:00   | 1     | 2     | 3     |   
| 2019-08-13 01:00:00   | 2     | 3     | 1     |   
| 2019-08-13 01:30:00   | 1     | 1     | 1     |   
| 2019-08-13 02:00:00   | 1     | 1     | 1     |   

The first column is the key for my data while the rest needs to be normalized

pandas recognizes the relatinship when I call .head(n) and emboldens the date. When I try to normalize the columns however, the date either disappears, gets normalized with the data (well resulting in all zeros) or the normalization just fails.

x = df_data.values[:1]

min_max_scaler = preprocessing.MinMaxScaler()
x_scaled = min_max_scaler.fit_transform(x)
df_data[:1] = pd.DataFrame(x_scaled)

How do you normalize all columns, i.e. 1,2,3 while keeping the first (0) in the frame.

Upvotes: 0

Views: 817

Answers (1)

jezrael
jezrael

Reputation: 862641

Convert first column to index, i.g. if name of first column is date:

print (df_data)
                  date  a  b  c
0  2019-08-13 00:30:00  1  2  3
1  2019-08-13 01:00:00  2  3  1
2  2019-08-13 01:30:00  1  1  1
3  2019-08-13 02:00:00  1  1  1

from sklearn import preprocessing

df_data = df_data.set_index('date')
x = df_data.to_numpy()
min_max_scaler = preprocessing.MinMaxScaler()
x_scaled = min_max_scaler.fit_transform(x)
df_data = pd.DataFrame(x_scaled, columns=df_data.columns, index=df_data.index)
print (df_data)
                       a    b    c
date                              
2019-08-13 00:30:00  0.0  0.5  1.0
2019-08-13 01:00:00  1.0  1.0  0.0
2019-08-13 01:30:00  0.0  0.0  0.0
2019-08-13 02:00:00  0.0  0.0  0.0

In your solution select all columns without first by DataFrame.iloc, first : means all rows and 1: select al columns excluding first, use solution and last assign back:

from sklearn import preprocessing

x = df_data.iloc[:, 1:].to_numpy()

min_max_scaler = preprocessing.MinMaxScaler()
x_scaled = min_max_scaler.fit_transform(x)
df_data.iloc[:, 1:] = x_scaled
print (df_data)
                  date    a    b    c
0  2019-08-13 00:30:00  0.0  0.5  1.0
1  2019-08-13 01:00:00  1.0  1.0  0.0
2  2019-08-13 01:30:00  0.0  0.0  0.0
3  2019-08-13 02:00:00  0.0  0.0  0.0

Upvotes: 1

Related Questions