Shaw
Shaw

Reputation: 1139

python pandas - how to apply a normalise function to a dataframe column

I have a pandas dataframe, output similar to below:

index    value
0    5.95
1    1.49
2    2.34
3    5.79
4    8.48

I want to get the normalised value of each column['value'] and store it in a new column['normalised'] but not sure how to apply the normalise function to the column...

my normalising function would look like this: (['value'] - min['value'])/(max['value'] - min['value']

I know I should probably be using the apply or transform function to add the new column to the dataframe but not sure how to pass the normalising function to the apply function...

Sorry if I'm getting the terminology wrong but I'm a newbe to python and in particular pandas!

Upvotes: 1

Views: 444

Answers (3)

Ami Tavory
Ami Tavory

Reputation: 76297

These are pretty standard column operations:

>>> (df.value - df.value.min()) / (df.value.max() - df.value.min())
0    0.638054
1    0.000000
2    0.121602
3    0.615165
4    1.000000
Name: value, dtype: float64

You can simply write

df['normalized'] = (df.value - ....

Upvotes: 3

Thomas Kimber
Thomas Kimber

Reputation: 11057

I'd consider user the lambda/apply method, which I'm sure you'll be able to finesse, which requires determining ahead of time the min and max values.

First, write a function that outputs a value, based on some 'global' parameters, and an input value fetched from a data-row.

def norm(vmax, vmin, val):
    return (val-vmin)/(vmax-vmin)

Next, collect your global values from the dataframe:

val_min = df['value'].min()
val_max = df['value'].max()

Finally, you can apply the function, creating a new field to hold the result:

df['new_field'] = df.apply(lambda row : norm(val_min,val_max,row['value']),axis=1)

df
    value   new_field
0   5.95    0.361946
1   1.49    1.000000
2   2.34    0.878398
3   5.79    0.384835
4   8.48    -0.000000

The beauty of using this 'lambda' approach, you can tweak your functions as you like, which (in my opinion anyway) compartmentalise the code better, allowing for reuse - which is always a good thing.

Upvotes: 1

user3274289
user3274289

Reputation: 2536

Lets call your DataFrame DF.

DF['normalised'] = (DF['value']-min(DF['value']))/(max(DF['value']-min(DF['value'])

does the trick.

Upvotes: 2

Related Questions