Reputation: 1309
df
fruit year price index_value Boolean index
apple 1960 11
apple 1961 12 100 True
apple 1962 13
apple 1963 13 100 True
banana 1960 11
banana 1961 12
How could I calculate the index column for the year after a True per fruit? The base year is given by the rows where index_value==100
I tried:
df['index'] = df.groupby('fruit')['price'].apply(lambda x: (x/x.iloc[0] * 100).round(0))
Expected Output:
fruit year price index_value Boolean index
apple 1960 11
apple 1961 12 100 True 100
apple 1962 13 108
apple 1963 13 100 True 100
apple 1964 11 84
banana 1961 12
Upvotes: 3
Views: 418
Reputation: 5026
I took the liberty to adjust your input data with a row for apple 1964 11
to match your output example. The column Boolean
is redundant
import pandas as pd
import numpy as np
import io
t = '''
fruit year price index_value
apple 1960 11
apple 1961 12 100
apple 1962 13
apple 1963 13 100
apple 1964 11
banana 1960 11
banana 1961 12
'''
df = pd.read_csv(io.StringIO(t), sep='\s+')
print(df)
Out:
fruit year price index_value
0 apple 1960 11 NaN
1 apple 1961 12 100.0
2 apple 1962 13 NaN
3 apple 1963 13 100.0
4 apple 1964 11 NaN
5 banana 1960 11 NaN
6 banana 1961 12 NaN
To get your desired output first create subgroups for values after a given index_value
df['groups'] = df.index_value.notna().groupby(df.fruit).cumsum().astype('int')
print(df)
Out:
fruit year price index_value groups
0 apple 1960 11 NaN 0
1 apple 1961 12 100.0 1
2 apple 1962 13 NaN 1
3 apple 1963 13 100.0 2
4 apple 1964 11 NaN 2
5 banana 1960 11 NaN 0
6 banana 1961 12 NaN 0
Then you can compute the percentage changes to the index_values
df['index_change'] = (
df[df.groups.ne(0)]
.groupby(['fruit','groups'])['price'].apply(lambda x: np.floor((x/x.iloc[0] * 100)))
)
print(df)
Out:
fruit year price index_value groups index_change
0 apple 1960 11 NaN 0 NaN
1 apple 1961 12 100.0 1 100.0
2 apple 1962 13 NaN 1 108.0
3 apple 1963 13 100.0 2 100.0
4 apple 1964 11 NaN 2 84.0
5 banana 1960 11 NaN 0 NaN
6 banana 1961 12 NaN 0 NaN
Upvotes: 1