Paul
Paul

Reputation: 123

Compute a ratio conditional on the value in the column of a panda dataframe

I have a dataframe of the following type

   df = pd.DataFrame({'Days':[1,2,5,6,7,10,11,12],
                   'Value':[100.3,150.5,237.0,314.15,188.0,413.0,158.2,268.0]})

      Days Value  
   0  1    100.3  
   1  2    150.5  
   2  5    237.0   
   3  6    314.15  
   4  7    188.0   
   5  10   413.0   
   6  11   158.2   
   7  12   268.0   

and I would like to add a column '+5Ratio' whose date is the ratio betwen Value corresponding to the Days+5 and Days.

For example in first row I would have 3.13210368893 = 314.15/100.3, in the second I would have 1.24916943522 = 188.0/150.5 and so on.

      Days Value  +5Ratio
   0  1    100.3  3.13210368893   
   1  2    150.5  1.24916943522   
   2  5    237.0   ...  
   3  6    314.15    
   4  7    188.0    
   5  10   413.0    
   6  11   158.2    
   7  12   268.0    

I'm strugling to find a way to do it using lambda function. Could someone give a help to find a way to solve this problem? Thanks in advance.

Edit In the case I am interested in the "Days" field can vary sparsly from 1 to 18180 for instance.

Upvotes: 0

Views: 761

Answers (2)

Parfait
Parfait

Reputation: 107687

Consider left merging on a helper dataframe, days, for consecutive daily points and then shift by 5 rows for ratio calculation. Finally remove the blank day rows:

days_df = pd.DataFrame({'Days':range(min(df.Days), max(df.Days)+1)})
days_df = days_df.merge(df, on='Days', how='left')

print(days_df)    
#     Days   Value
# 0      1  100.30
# 1      2  150.50
# 2      3     NaN
# 3      4     NaN
# 4      5  237.00
# 5      6  314.15
# 6      7  188.00
# 7      8     NaN
# 8      9     NaN
# 9     10  413.00
# 10    11  158.20
# 11    12  268.00

days_df['+5ratio'] = days_df.shift(-5)['Value'] / days_df['Value']    
final_df = days_df[days_df['Value'].notnull()].reset_index(drop=True)

print(final_df)
#    Days   Value   +5ratio
# 0     1  100.30  3.132104
# 1     2  150.50  1.249169
# 2     5  237.00  1.742616
# 3     6  314.15  0.503581
# 4     7  188.00  1.425532
# 5    10  413.00       NaN
# 6    11  158.20       NaN
# 7    12  268.00       NaN

Upvotes: 1

BENY
BENY

Reputation: 323316

You can using merge , and the benefit from doing this , can handle missing value

s=df.merge(df.assign(Days=df.Days-5),on='Days')
s.assign(Value=s.Value_y/s.Value_x).drop(['Value_x','Value_y'],axis=1)
Out[359]: 
   Days     Value
0     1  3.132104
1     2  1.249169
2     5  1.742616
3     6  0.503581
4     7  1.425532

Upvotes: 1

Related Questions