Reputation: 1521
I am trying to apply a function to a single column of my dataframe (specifically, normalization).
The dataframe looks like this:
Euclidian H N Volume
222 0.012288 0.00518 0.011143 85203000.0
99 1.296833 -0.80266 1.018583 17519400.0
98 1.618482 -0.60979 1.499213 16263900.0
211 2.237388 0.38073 -2.204757 38375400.0
175 2.313548 0.35656 -2.285907 66974200.0
102 3.319342 3.01295 -1.392897 33201000.0
7 3.424589 -0.31313 3.410243 97924700.0
64 3.720370 -0.03526 3.720203 116514000.0
125 3.995138 0.27396 3.985733 80526200.0
210 4.999969 0.46453 4.978343 70612100.0
The dataframe is named 'discrepancies', and my code is as such:
max = discrepancies['Volume'].max()
discrepancies['Volume'].apply(lambda x: x/max)
return discrepancies
But the column values do not change. I cannot find anywhere in the documentation to apply to single columns, they only talk about applying to all columns or all rows:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html
Thank you
Upvotes: 1
Views: 584
Reputation: 522
the problem with your code is that pandas.apply
returns the result as new data frame. (there is inplace
attribute for lots of pandas functions but not apply
)
to correct you code you should do:
max = discrepancies['Volume'].max()
discrepancies['Volume'] = discrepancies['Volume'].apply(lambda x: x/max)
return discrepancies
or you can use @YOBEN_S answer.
Upvotes: 1
Reputation: 1420
If it is just a single column, you don't need to use apply
. Directly divide the column using its max will do.
discrepancies['Volume'] = discrepancies['Volume'] / discrepancies['Volume'].max()
Upvotes: 3
Reputation: 323226
Since single columns do not need apply
also we need assign it back
max = discrepancies['Volume'].max()
discrepancies['some col']=discrepancies['Volume']/max
Also series you can use map
max = discrepancies['Volume'].max()
discrepancies['Volume'].map(lambda x: x/max)
Upvotes: 1