Calculate index using base year

Question

df

fruit  year price index_value Boolean  index
apple  1960  11    
apple  1961  12    100        True
apple  1962  13    
apple  1963  13    100        True
banana 1960  11    
banana 1961  12

How could I calculate the index column for the year after a True per fruit? The base year is given by the rows where index_value==100

I tried:

df['index'] = df.groupby('fruit')['price'].apply(lambda x: (x/x.iloc[0] * 100).round(0))

Expected Output:

fruit  year price index_value Boolean  index
apple  1960  11    
apple  1961  12    100        True      100
apple  1962  13                         108
apple  1963  13    100        True      100
apple  1964  11                         84 
banana 1961  12

Michael Szczesny · Accepted Answer

I took the liberty to adjust your input data with a row for apple 1964 11 to match your output example. The column Boolean is redundant

import pandas as pd
import numpy as np
import io

t = '''
fruit  year price index_value 
apple  1960  11    
apple  1961  12    100        
apple  1962  13                         
apple  1963  13    100
apple  1964  11 
banana 1960  11  
banana 1961  12
'''

df = pd.read_csv(io.StringIO(t), sep='\s+')
print(df)

Out:

    fruit  year  price  index_value
0   apple  1960     11          NaN
1   apple  1961     12        100.0
2   apple  1962     13          NaN
3   apple  1963     13        100.0
4   apple  1964     11          NaN
5  banana  1960     11          NaN
6  banana  1961     12          NaN

To get your desired output first create subgroups for values after a given index_value

df['groups'] = df.index_value.notna().groupby(df.fruit).cumsum().astype('int')
print(df)

Out:

    fruit  year  price  index_value  groups
0   apple  1960     11          NaN       0
1   apple  1961     12        100.0       1
2   apple  1962     13          NaN       1
3   apple  1963     13        100.0       2
4   apple  1964     11          NaN       2
5  banana  1960     11          NaN       0
6  banana  1961     12          NaN       0

Then you can compute the percentage changes to the index_values

df['index_change'] = (
    df[df.groups.ne(0)]
      .groupby(['fruit','groups'])['price'].apply(lambda x: np.floor((x/x.iloc[0] * 100)))
)
print(df)

Out:

    fruit  year  price  index_value  groups  index_change
0   apple  1960     11          NaN       0           NaN
1   apple  1961     12        100.0       1         100.0
2   apple  1962     13          NaN       1         108.0
3   apple  1963     13        100.0       2         100.0
4   apple  1964     11          NaN       2          84.0
5  banana  1960     11          NaN       0           NaN
6  banana  1961     12          NaN       0           NaN

Calculate index using base year

Answers (1)

Related Questions