Reputation: 481
Given the following dataframe
df = pd.DataFrame(data={'name': ['a', 'a', 'a', 'b', 'b', 'b', 'b', 'c', 'c', 'c'],
'lag': [1, 1, 1, 2, 2, 2, 2, 2, 2, 2],
'value': range(10)})
print(df)
lag name value
0 1 a 0
1 1 a 1
2 1 a 2
3 2 b 3
4 2 b 4
5 2 b 5
6 2 b 6
7 2 c 7
8 2 c 8
9 2 c 9
I am trying to shift values contained in column value
to obtain the column expected_value
, which is the shifted values grouped by column name
and shifted by lag
rows. I was thinking of using something like df['expected_value'] = df.groupby(['name', 'lag']).shift()
, but I am not sure how to pass lag
to the shift()
function.
print(df)
lag name value expected_value
0 1 a 0 nan
1 1 a 1 0.0000
2 1 a 2 1.0000
3 2 b 3 nan
4 2 b 4 nan
5 2 b 5 3.0000
6 2 b 6 4.0000
7 2 c 7 nan
8 2 c 8 nan
9 2 c 9 7.0000
Upvotes: 1
Views: 196
Reputation: 20669
You can use GroupBy.transform
here.
df.assign(expected_value = df.groupby(['name', 'lag'])['value'].
transform(lambda x: x.shift(x.name[1])))
name lag value expected_value
0 a 1 0 NaN
1 a 1 1 0.0
2 a 1 2 1.0
3 b 2 3 NaN
4 b 2 4 NaN
5 b 2 5 3.0
6 b 2 6 4.0
7 c 2 7 NaN
8 c 2 8 NaN
9 c 2 9 7.0
Upvotes: 2
Reputation: 150735
You can do with an apply
:
df['new_val'] = (df.groupby('name')
.apply(lambda x: x['value'].shift(x['lag'].iloc[0]))
.reset_index('name',drop=True)
)
Output:
name lag value new_val
0 a 1 0 NaN
1 a 1 1 0.0
2 a 1 2 1.0
3 b 2 3 NaN
4 b 2 4 NaN
5 b 2 5 3.0
6 b 2 6 4.0
7 c 2 7 NaN
8 c 2 8 NaN
9 c 2 9 7.0
Upvotes: 2