Huzefa Sadikot
Huzefa Sadikot

Reputation: 581

Normalization Of single Column Of Dataframe

I have a Dataframe as shown:

Dataframe

I have to normalise the Close column of dataframe as follows:

  1. For Each symbol we have to divide the close price on succeeding days by the First close price. This means the normalised closed for APLAPOLLO on 24/11/2020 will be calculated as:

    Normalised_Close_price=(Close on 24/11/2020)/(Close on 23/11/2020)=0.9915( Valid Only For Aplapollo)

  2. Now if the symbol changes the method remains same only the values in the above formulae changes. Hence For AUBANK the normalised closed on 24/11/2020 will be calculated as:

    Normalised_Close_price=(Close on 24/11/2020)/(Close on 23/11/2020)=0.9915( Valid Only for AUBANK)

  3. The normalised close for other symbol should be calculated in the same way.

Hence after the calculation the dataframe should look as shown:

Expected Dataframe

eg: For Aplapollo from above explanation:

  1. normalise value for 24/11/2020=(3219.95/3247.45)=0.991532
  2. normalise value for 25/11/2020=(Close on 25/11/2020)/(Close on 23/11/2020)=0.991686
  3. normalise value for 26/11/2020=(Close on 26/11/2020)/(Close on 23/11/2020)=0.978907

Similarly for other symbols

I have referred to the following answer on normalising dataframe columns:

Normalize columns of pandas data frame

This does not help as the values are changing as per the condition in my case

Upvotes: 1

Views: 270

Answers (2)

Pygirl
Pygirl

Reputation: 13349

Try:

df1['Normalize'] = df1.groupby('Symbol')['Close'].transform(lambda x: x/x.iloc[0]).fillna(1)#.reset_index()

As commented by Shubham:

you can divide groups by its first value by

df['Close'] /= df1.groupby('Symbol')['Close'].transform('first')

:)


df1:

    Date        Symbol      Close   Normalize
0   2020-11-23  APLAPOLLO   3247.45 1.000000
1   2020-11-24  APLAPOLLO   3219.95 0.991532
2   2020-11-25  APLAPOLLO   3220.45 0.991686
3   2020-11-26  APLAPOLLO   3178.95 0.978907
4   2020-11-27  APLAPOLLO   3378.90 1.040478
5   2020-12-01  APLAPOLLO   3446.85 1.061402
6   2020-12-02  APLAPOLLO   3514.55 1.082249
7   2020-12-03  APLAPOLLO   3545.80 1.091872
8   2020-12-04  APLAPOLLO   3708.60 1.142004
9   2020-12-07  APLAPOLLO   3868.55 1.191258
10  2020-12-08  APLAPOLLO   3750.30 1.154845
11  2020-12-09  APLAPOLLO   3801.35 1.170565
12  2020-12-10  APLAPOLLO   3766.65 1.159879
13  2020-12-11  APLAPOLLO   3674.30 1.131442
14  2020-12-14  APLAPOLLO   3814.80 1.174706
15  2020-12-15  APLAPOLLO   780.55  0.240358
16  2020-12-16  APLAPOLLO   790.20  0.243329
17  2020-12-17  APLAPOLLO   791.20  0.243637
18  2020-12-18  APLAPOLLO   769.70  0.237017
19  2020-12-21  APLAPOLLO   726.60  0.223745
20  2020-12-22  APLAPOLLO   744.30  0.229195
21  2020-11-23  AUBANK      869.65  1.000000
22  2020-11-24  AUBANK      874.35  1.005404
23  2020-11-25  AUBANK      856.25  0.984592
24  2020-11-26  AUBANK      861.05  0.990111
25  2020-11-27  AUBANK      839.05  0.964813
26  2020-12-01  AUBANK      872.90  1.003737
27  2020-12-02  AUBANK      886.65  1.019548
28  2020-12-03  AUBANK      880.30  1.012246
29  2020-12-04  AUBANK      880.45  1.012419
30  2020-12-07  AUBANK      898.65  1.033347
31  2020-12-08  AUBANK      907.80  1.043868
32  2020-12-09  AUBANK      918.90  1.056632
33  2020-12-10  AUBANK      911.05  1.047605
34  2020-12-11  AUBANK      920.30  1.058242
35  2020-12-14  AUBANK      929.45  1.068763
36  2020-12-15  AUBANK      922.60  1.060887
37  2020-12-16  AUBANK      915.80  1.053067
38  2020-12-17  AUBANK      943.15  1.084517
39  2020-12-18  AUBANK      897.00  1.031449
40  2020-12-21  AUBANK      840.45  0.966423
41  2020-12-22  AUBANK      856.00  0.984304
42  2020-11-23  AARTIDRUGS  711.70  1.000000

Upvotes: 3

mmrbulbul
mmrbulbul

Reputation: 400

I think what you are looking for is this.

   df["Normalise"] =  df.groupby('Symbol', sort=False)['Close'].rolling(2).apply(lambda x: x.iloc[1] / x.iloc[0]).reset_index(0, drop=True)

Upvotes: 1

Related Questions