Pandas, how to apply changes dataframe with groupby() / conditional function calling?

Question

I have a huge timeseries dataframe with over 1m rows. It has list of monthly stock returns and I'd like to create a new row to track the rolling sum of 3 previous months. The dataframe has all company A rows first, then all company B rows, then all company C rows...

For example:

          date                COMNAM    PRC    RET
395 2017-02-28       GAS NATURAL INC 12.650  0.000
396 2017-03-31       GAS NATURAL INC 12.700  0.010
397 2017-04-28       GAS NATURAL INC 12.500 -0.016
398 2017-05-31       GAS NATURAL INC 12.700  0.016
399 2017-06-30       GAS NATURAL INC 12.925  0.024
400 2017-07-31       GAS NATURAL INC 12.950  0.002
401 2017-08-31       GAS NATURAL INC    nan    nan
402 1985-12-31                   NaN    nan    nan
403 1986-01-31  MOBILE NATIONAL CORP 11.625    nan
404 1986-02-28  MOBILE NATIONAL CORP 13.250  0.140
405 1986-03-31  MOBILE NATIONAL CORP 14.188  0.071
406 1986-04-30  MOBILE NATIONAL CORP 14.938  0.053
407 1986-05-30  MOBILE NATIONAL CORP 14.625 -0.021
408 1986-06-30  MOBILE NATIONAL CORP 12.688 -0.132
409 1986-07-31  MOBILE NATIONAL CORP 13.312  0.049
410 1986-08-29  MOBILE NATIONAL CORP 13.312  0.000
411 1986-09-30  MOBILE NATIONAL CORP 14.250  0.070
412 1986-10-31  MOBILE NATIONAL CORP 13.375 -0.061
413 1986-11-28  MOBILE NATIONAL CORP 13.375  0.000
414 1986-12-31  MOBILE NATIONAL CORP 12.375 -0.075

The rolling()-function would give me the sum of 3 previous months, but that would include the last returns of the previous stock for the first dates of each company. I have a feeling the groupby() function might be able to help, but I'm kinda stuck on how. Or am I overthinking it, and there is some better way where I don't even need the groupby?

Valdi_Bo · Accepted Answer

To compute the rolling sum of 3 previous months (without the current month), for two columns of interest, from the current group of rows, define the following function:

def mySum(grp):
    return grp[['PRC', 'RET']].shift().rolling(3).sum()

Then, to get such rolling sums for each group (company), run:

result = df.join(df[df.COMNAM.notnull()].groupby('COMNAM').apply(mySum)\
    .reset_index(level=0, drop=True).add_prefix('r'))

The result is a join between the current df and the result of invocation of the above function for each group (company). Column names of the intermediate result are prepended with r, to mark the rolling sum.

For your data sample the result is:

         date                COMNAM     PRC    RET    rPRC   rRET
0  2017-02-28       GAS NATURAL INC  12.650  0.000     NaN    NaN
1  2017-03-31       GAS NATURAL INC  12.700  0.010     NaN    NaN
2  2017-04-28       GAS NATURAL INC  12.500 -0.016     NaN    NaN
3  2017-05-31       GAS NATURAL INC  12.700  0.016  37.850 -0.006
4  2017-06-30       GAS NATURAL INC  12.925  0.024  37.900  0.010
5  2017-07-31       GAS NATURAL INC  12.950  0.002  38.125  0.024
6  2017-08-31       GAS NATURAL INC     NaN    NaN  38.575  0.042
7  1985-12-31                   NaN     NaN    NaN     NaN    NaN
8  1986-01-31  MOBILE NATIONAL CORP  11.625    NaN     NaN    NaN
9  1986-02-28  MOBILE NATIONAL CORP  13.250  0.140     NaN    NaN
10 1986-03-31  MOBILE NATIONAL CORP  14.188  0.071     NaN    NaN
11 1986-04-30  MOBILE NATIONAL CORP  14.938  0.053  39.063    NaN
12 1986-05-30  MOBILE NATIONAL CORP  14.625 -0.021  42.376  0.264
13 1986-06-30  MOBILE NATIONAL CORP  12.688 -0.132  43.751  0.103
14 1986-07-31  MOBILE NATIONAL CORP  13.312  0.049  42.251 -0.100
15 1986-08-29  MOBILE NATIONAL CORP  13.312  0.000  40.625 -0.104
16 1986-09-30  MOBILE NATIONAL CORP  14.250  0.070  39.312 -0.083
17 1986-10-31  MOBILE NATIONAL CORP  13.375 -0.061  40.874  0.119
18 1986-11-28  MOBILE NATIONAL CORP  13.375  0.000  40.937  0.009
19 1986-12-31  MOBILE NATIONAL CORP  12.375 -0.075  41.000  0.009

If you want to "ignore" NaN values (treat them as 0), change the function to:

def mySum(grp):
    return grp[['PRC', 'RET']].fillna(0).shift().rolling(3).sum()

Pandas, how to apply changes dataframe with groupby() / conditional function calling?

Answers (1)

Related Questions