Reputation: 127
I am attempting to perform a dynamic shift within a groupby object. In this case my grouping is Account and each account will have the column Valuation shifted by minus the number of rows specified in the column Shift. There was a similar question a while ago but that involved a cumsum, where as here I just want the value. See dynamic shift with groupby on dataframe. If possible I'd like to avoid an apply for performance reasons as I have 10s of millions of rows.
import pandas as pd
import numpy as np
df = pd.DataFrame({
'Account': [1000001, 1000001, 1000001, 1000001, 1000001, 1000001, 1000001,
1000001, 1000001, 1000001, 1000002, 1000002, 1000002, 1000002,
1000002, 1000002, 1000002, 1000002, 1000002],
'Date': ['Jan-18', 'Feb-18', 'Mar-18', 'Apr-18', 'May-18', 'Jun-18',
'Jul-18', 'Aug-18', 'Sep-18', 'Oct-18', 'Jan-18', 'Feb-18',
'Mar-18', 'Apr-18', 'May-18', 'Jun-18', 'Jul-18', 'Aug-18',
'Sep-18'],
'Valuation':[ 50000, 51000, 52020, 53060, 54122, 55204, 56308, 57434,
58583, 59755, 100000, 102000, 104040, 106121, 108243, 110408,
112616, 114869, 117166],
'Shift': [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2] })
The desired dataframe looks like this:
Upvotes: 2
Views: 355
Reputation: 8033
check this out.
def sh(x):
s = df.loc[x.index, 'Shift']
return (x.shift(-s.iloc[0]))
df['Valuation_shifted']= (df.groupby('Account')['Valuation'].apply(sh))
I know you said you did not want to do apply. But in this case, we are not doing lambda apply. Rather, we are doing a function that finds out the first value of the column 'Shift' in each group & shifts 'Valuation_shifted' by that much.
Upvotes: 1
Reputation: 59549
You likely have far more unique accounts than shifts, so instead we will loop over the small number of shifts. Given the sorting on 'Account'
, a where
checking Account is equal to the shifted Account ensures it's within group.
import pandas as pd
s = pd.Series()
for shift in df.Shift.unique():
u = (df[df.Shift.eq(shift)].Valuation.shift(-shift)
.where(df.Account.eq(df.Account.shift(-shift))))
s = s.combine_first(u)
df['Valuation Shifted'] = s
Account Date Valuation Shift Valuation Shifted
0 1000001 Jan-18 50000 3 53060.0
1 1000001 Feb-18 51000 3 54122.0
2 1000001 Mar-18 52020 3 55204.0
3 1000001 Apr-18 53060 3 56308.0
4 1000001 May-18 54122 3 57434.0
5 1000001 Jun-18 55204 3 58583.0
6 1000001 Jul-18 56308 3 59755.0
7 1000001 Aug-18 57434 3 NaN
8 1000001 Sep-18 58583 3 NaN
9 1000001 Oct-18 59755 3 NaN
10 1000002 Jan-18 100000 2 104040.0
11 1000002 Feb-18 102000 2 106121.0
12 1000002 Mar-18 104040 2 108243.0
13 1000002 Apr-18 106121 2 110408.0
14 1000002 May-18 108243 2 112616.0
15 1000002 Jun-18 110408 2 114869.0
16 1000002 Jul-18 112616 2 117166.0
17 1000002 Aug-18 114869 2 NaN
18 1000002 Sep-18 117166 2 NaN
Upvotes: 2