Reputation: 137
I have a dataframe like this:
### To Stack
df_stack = pd.DataFrame([['SGA', 'FT', 5.6994352340698242],
['SGA', 'ST', 1.518241786956787],
['EXT', 'FT', 20.853669309616087],
['EXT', 'ST', 3.5596847057342529],
['LGT', 'FT', 9.853669309616087],
['LGT', 'ST', 8.5596847057342529]], columns = (('idobj','description','value')))
df_stack
idobj description value
SGA FT 5.69944
SGA ST 1.51824
EXT FT 20.85367
EXT ST 3.55968
LGT FT 9.85367
LGT ST 8.5596
I need to apply min-max scaling on parts of data (e.g., apply MinMaxScaler
on 'Description'='ST'
, then apply MinMaxScaler
on 'Description'='ST'
).
When I apply MinMaxScaler
for each "Description" this way:
scaler.fit(df_stack[(df_stack['description']=='FT')]['value'].values.reshape(-1,1))
scaler.transform(df_stack[(df_stack['description']=='FT')]['value'].values.reshape(-1,1))
array([[0. ],
[1. ],
[0.27413]])
scaler.fit(df_stack[(df_stack['description']=='ST')]['value'].values.reshape(-1,1))
scaler.transform(df_stack[(df_stack['description']=='ST')]['value'].values.reshape(-1,1))
array([[0. ],
[0.28992],
[1. ]])
I end up losing the correspondence with my "Description".
I've tried to manually compute by adding a new column like this:
df_stack['description_norm'] = (df_stack[(df_stack['description']=='FT')]['value']-\
df_stack[(df_stack['description']=='FT')]['value'].min())/\
(df_stack[(df_stack['description']=='FT')]['value'].max()-\
df_stack[(df_stack['description']=='FT')]['value'].min())
But then I get another column:
idobj description value description_norm
0 SGA FT 5.69944 0.00000
1 SGA ST 1.51824 NaN
2 EXT FT 20.85367 1.00000
3 EXT ST 3.55968 NaN
4 LGT FT 9.85367 0.27413
5 LGT ST 8.55968 NaN
And when I try to assign the values on this new column to the variable 'Description' based on the condition 'FT' it simply doesn't work?!?
df_stack[(df_stack['description']=='FT')]['value']=df_stack[(df_stack['description']=='FT')]['description_norm']
What am I missing?
The expected result should be :
idobj description value
SGA FT 0
SGA ST 0
EXT FT 1
EXT ST 0.28992
LGT FT 0.27413
LGT ST 1
Any ideas are welcome.
Upvotes: 2
Views: 473
Reputation: 28322
What you want to do is applying a function on a column based on the value of another column, in pandas the best way to do this is by using groupby
.
In this case, we can do the scaling manually like this:
df_stack['value'] = df_stack.groupby('description')['value'].\
apply(lambda x: (x-x.min())/(x.max()-x.min()))
or by using MinMaxScaler
:
df_stack['value'] = df_stack.groupby('description')['value'].\
transform(lambda x: MinMaxScaler().fit_transform(x.values.reshape(-1,1)).flatten())
Both will give the same result:
idobj description value
0 SGA FT 0.000000
1 SGA ST 0.000000
2 EXT FT 1.000000
3 EXT ST 0.289918
4 LGT FT 0.274130
5 LGT ST 1.000000
Upvotes: 2