asd
asd

Reputation: 1309

Calculate index using base year

df

fruit  year price index_value Boolean  index
apple  1960  11    
apple  1961  12    100        True
apple  1962  13    
apple  1963  13    100        True
banana 1960  11    
banana 1961  12    

How could I calculate the index column for the year after a True per fruit? The base year is given by the rows where index_value==100

I tried:

df['index'] = df.groupby('fruit')['price'].apply(lambda x: (x/x.iloc[0] * 100).round(0))

Expected Output:

fruit  year price index_value Boolean  index
apple  1960  11    
apple  1961  12    100        True      100
apple  1962  13                         108
apple  1963  13    100        True      100
apple  1964  11                         84 
banana 1961  12    

Upvotes: 3

Views: 418

Answers (1)

Michael Szczesny
Michael Szczesny

Reputation: 5026

I took the liberty to adjust your input data with a row for apple 1964 11 to match your output example. The column Boolean is redundant

import pandas as pd
import numpy as np
import io

t = '''
fruit  year price index_value 
apple  1960  11    
apple  1961  12    100        
apple  1962  13                         
apple  1963  13    100
apple  1964  11 
banana 1960  11  
banana 1961  12
'''

df = pd.read_csv(io.StringIO(t), sep='\s+')
print(df)

Out:

    fruit  year  price  index_value
0   apple  1960     11          NaN
1   apple  1961     12        100.0
2   apple  1962     13          NaN
3   apple  1963     13        100.0
4   apple  1964     11          NaN
5  banana  1960     11          NaN
6  banana  1961     12          NaN

To get your desired output first create subgroups for values after a given index_value

df['groups'] = df.index_value.notna().groupby(df.fruit).cumsum().astype('int')
print(df)

Out:

    fruit  year  price  index_value  groups
0   apple  1960     11          NaN       0
1   apple  1961     12        100.0       1
2   apple  1962     13          NaN       1
3   apple  1963     13        100.0       2
4   apple  1964     11          NaN       2
5  banana  1960     11          NaN       0
6  banana  1961     12          NaN       0

Then you can compute the percentage changes to the index_values

df['index_change'] = (
    df[df.groups.ne(0)]
      .groupby(['fruit','groups'])['price'].apply(lambda x: np.floor((x/x.iloc[0] * 100)))
)
print(df)

Out:

    fruit  year  price  index_value  groups  index_change
0   apple  1960     11          NaN       0           NaN
1   apple  1961     12        100.0       1         100.0
2   apple  1962     13          NaN       1         108.0
3   apple  1963     13        100.0       2         100.0
4   apple  1964     11          NaN       2          84.0
5  banana  1960     11          NaN       0           NaN
6  banana  1961     12          NaN       0           NaN

Upvotes: 1

Related Questions