user2335564
user2335564

Reputation: 127

Pandas dynamic Groupby and Shift

I am attempting to perform a dynamic shift within a groupby object. In this case my grouping is Account and each account will have the column Valuation shifted by minus the number of rows specified in the column Shift. There was a similar question a while ago but that involved a cumsum, where as here I just want the value. See dynamic shift with groupby on dataframe. If possible I'd like to avoid an apply for performance reasons as I have 10s of millions of rows.

import pandas as pd
import numpy as np

    df = pd.DataFrame({
        'Account': [1000001, 1000001, 1000001, 1000001, 1000001, 1000001, 1000001,
                    1000001, 1000001, 1000001, 1000002, 1000002, 1000002, 1000002,
                    1000002, 1000002, 1000002, 1000002, 1000002],
        'Date': ['Jan-18', 'Feb-18', 'Mar-18', 'Apr-18', 'May-18', 'Jun-18',
                 'Jul-18', 'Aug-18', 'Sep-18', 'Oct-18', 'Jan-18', 'Feb-18',
                 'Mar-18', 'Apr-18', 'May-18', 'Jun-18', 'Jul-18', 'Aug-18',
                 'Sep-18'],
        'Valuation':[ 50000,  51000,  52020,  53060,  54122,  55204,  56308,  57434,
                     58583,  59755, 100000, 102000, 104040, 106121, 108243, 110408,
                     112616, 114869, 117166],
        'Shift': [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2]       })

The desired dataframe looks like this:

enter image description here

Upvotes: 2

Views: 355

Answers (2)

moys
moys

Reputation: 8033

check this out.

def sh(x):
    s = df.loc[x.index, 'Shift']
    return (x.shift(-s.iloc[0]))
df['Valuation_shifted']= (df.groupby('Account')['Valuation'].apply(sh))

I know you said you did not want to do apply. But in this case, we are not doing lambda apply. Rather, we are doing a function that finds out the first value of the column 'Shift' in each group & shifts 'Valuation_shifted' by that much.

Upvotes: 1

ALollz
ALollz

Reputation: 59549

You likely have far more unique accounts than shifts, so instead we will loop over the small number of shifts. Given the sorting on 'Account', a where checking Account is equal to the shifted Account ensures it's within group.

import pandas as pd

s = pd.Series()
for shift in df.Shift.unique():
    u = (df[df.Shift.eq(shift)].Valuation.shift(-shift)
           .where(df.Account.eq(df.Account.shift(-shift))))
    s = s.combine_first(u)

df['Valuation Shifted'] = s

    Account    Date  Valuation  Shift  Valuation Shifted
0   1000001  Jan-18      50000      3            53060.0
1   1000001  Feb-18      51000      3            54122.0
2   1000001  Mar-18      52020      3            55204.0
3   1000001  Apr-18      53060      3            56308.0
4   1000001  May-18      54122      3            57434.0
5   1000001  Jun-18      55204      3            58583.0
6   1000001  Jul-18      56308      3            59755.0
7   1000001  Aug-18      57434      3                NaN
8   1000001  Sep-18      58583      3                NaN
9   1000001  Oct-18      59755      3                NaN
10  1000002  Jan-18     100000      2           104040.0
11  1000002  Feb-18     102000      2           106121.0
12  1000002  Mar-18     104040      2           108243.0
13  1000002  Apr-18     106121      2           110408.0
14  1000002  May-18     108243      2           112616.0
15  1000002  Jun-18     110408      2           114869.0
16  1000002  Jul-18     112616      2           117166.0
17  1000002  Aug-18     114869      2                NaN
18  1000002  Sep-18     117166      2                NaN

Upvotes: 2

Related Questions