Rolling Difference for Intervals of Rows

Question

I'm fairly new at Python, and kinda stuck. I need to add a column to a dataframe containing the rolling difference of a column's value. However, I need this count to reset in intervals of 3. Best I could do was

df['Value'].rolling(window=2).apply(lambda x: x[1] - x[0])

But I don't know how to reset the diff count. I've experimented with groupby, but with no results.

The table at hand looks like this

ID    Location   Year    Value

01       EU      2000    42.402
02       EU      2001    44.336
03       EU      2002    46.477
04       US      2000    0.456
06       US      2001    0.438
07       US      2002    0.437

The desired results should look like this:

ID    Location   Year    Value      RD

01       EU      2000    42.402    Null
02       EU      2001    44.336    1.934
03       EU      2002    46.477    2.141
04       US      2000    0.456     Null
06       US      2001    0.438    -0.124
07       US      2002    0.437     0.001

Thanks a lot.

jezrael · Accepted Answer

Use DataFrame.groupby by integer division by 3 for general solution with arange by length of DataFrame:

idx = np.arange(len(df)) // 3
#if default index
#idx = df.index // 3
s = df.groupby(idx)['Value'].rolling(window=2).apply(lambda x: x[1] - x[0], 'raw=False')
df['RD'] = s.reset_index(level=0, drop=True)
print (df)
   ID Location  Year   Value     RD
0   1       EU  2000  42.402    NaN
1   2       EU  2001  44.336  1.934
2   3       EU  2002  46.477  2.141
3   4       US  2000   0.456    NaN
4   6       US  2001   0.438 -0.018
5   7       US  2002   0.437 -0.001

Rolling Difference for Intervals of Rows

Answers (1)

Related Questions