Reputation: 381
The data looks something like this
| 2019-08-13 00:30:00 | 1 | 2 | 3 |
| 2019-08-13 01:00:00 | 2 | 3 | 1 |
| 2019-08-13 01:30:00 | 1 | 1 | 1 |
| 2019-08-13 02:00:00 | 1 | 1 | 1 |
The first column is the key for my data while the rest needs to be normalized
pandas recognizes the relatinship when I call .head(n)
and emboldens the date. When I try to normalize the columns however, the date either disappears, gets normalized with the data (well resulting in all zeros) or the normalization just fails.
x = df_data.values[:1]
min_max_scaler = preprocessing.MinMaxScaler()
x_scaled = min_max_scaler.fit_transform(x)
df_data[:1] = pd.DataFrame(x_scaled)
How do you normalize all columns, i.e. 1,2,3 while keeping the first (0) in the frame.
Upvotes: 0
Views: 817
Reputation: 862641
Convert first column to index, i.g. if name of first column is date
:
print (df_data)
date a b c
0 2019-08-13 00:30:00 1 2 3
1 2019-08-13 01:00:00 2 3 1
2 2019-08-13 01:30:00 1 1 1
3 2019-08-13 02:00:00 1 1 1
from sklearn import preprocessing
df_data = df_data.set_index('date')
x = df_data.to_numpy()
min_max_scaler = preprocessing.MinMaxScaler()
x_scaled = min_max_scaler.fit_transform(x)
df_data = pd.DataFrame(x_scaled, columns=df_data.columns, index=df_data.index)
print (df_data)
a b c
date
2019-08-13 00:30:00 0.0 0.5 1.0
2019-08-13 01:00:00 1.0 1.0 0.0
2019-08-13 01:30:00 0.0 0.0 0.0
2019-08-13 02:00:00 0.0 0.0 0.0
In your solution select all columns without first by DataFrame.iloc
, first :
means all rows and 1:
select al columns excluding first, use solution and last assign back:
from sklearn import preprocessing
x = df_data.iloc[:, 1:].to_numpy()
min_max_scaler = preprocessing.MinMaxScaler()
x_scaled = min_max_scaler.fit_transform(x)
df_data.iloc[:, 1:] = x_scaled
print (df_data)
date a b c
0 2019-08-13 00:30:00 0.0 0.5 1.0
1 2019-08-13 01:00:00 1.0 1.0 0.0
2 2019-08-13 01:30:00 0.0 0.0 0.0
3 2019-08-13 02:00:00 0.0 0.0 0.0
Upvotes: 1