Vranvs
Vranvs

Reputation: 1521

Apply function to single column of pandas Dataframe

I am trying to apply a function to a single column of my dataframe (specifically, normalization).

The dataframe looks like this:

     Euclidian        H         N       Volume
222   0.012288  0.00518  0.011143   85203000.0
99    1.296833 -0.80266  1.018583   17519400.0
98    1.618482 -0.60979  1.499213   16263900.0
211   2.237388  0.38073 -2.204757   38375400.0
175   2.313548  0.35656 -2.285907   66974200.0
102   3.319342  3.01295 -1.392897   33201000.0
7     3.424589 -0.31313  3.410243   97924700.0
64    3.720370 -0.03526  3.720203  116514000.0
125   3.995138  0.27396  3.985733   80526200.0
210   4.999969  0.46453  4.978343   70612100.0

The dataframe is named 'discrepancies', and my code is as such:

max = discrepancies['Volume'].max()
discrepancies['Volume'].apply(lambda x: x/max)
return discrepancies

But the column values do not change. I cannot find anywhere in the documentation to apply to single columns, they only talk about applying to all columns or all rows:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html

Thank you

Upvotes: 1

Views: 584

Answers (3)

Syed Mohammad Hosseini
Syed Mohammad Hosseini

Reputation: 522

the problem with your code is that pandas.apply returns the result as new data frame. (there is inplace attribute for lots of pandas functions but not apply)

to correct you code you should do:

max = discrepancies['Volume'].max()
discrepancies['Volume'] = discrepancies['Volume'].apply(lambda x: x/max)
return discrepancies

or you can use @YOBEN_S answer.

Upvotes: 1

Toukenize
Toukenize

Reputation: 1420

If it is just a single column, you don't need to use apply. Directly divide the column using its max will do.

discrepancies['Volume'] = discrepancies['Volume'] / discrepancies['Volume'].max()

Upvotes: 3

BENY
BENY

Reputation: 323226

Since single columns do not need apply also we need assign it back

max = discrepancies['Volume'].max()
discrepancies['some col']=discrepancies['Volume']/max

Also series you can use map

max = discrepancies['Volume'].max()
discrepancies['Volume'].map(lambda x: x/max)

Upvotes: 1

Related Questions