Reputation: 661

Compute difference between rows prior to and following to the specific row_pandas

I want to find the difference between rows prior to and following to the specific row. Specifically, I have the following dataset:

 Number of rows       A             
 1                    4             
 2                    2             
 3                    2             
 4                    3             
 5                    2

I should get the following data:

 Number of rows      A        B
 1                   4        NaN (since there is not row before this row)
 2                   2        2   (4-2)
 3                   2       -1   (2-3)
 4                   3        0   (2-2)
 5                   2        NaN (since there is not row after this row)

As you can see, each row in column B, equal the difference between previous and following rows in column A. For example, second row in column B, equal the difference between value in the first row in column A and value in the third row in column A. IMPORTANT POINT: I do not need only previous and following. I should find the difference between previous 2 and the following 2 rows. I meant the value in row Number 23 in column B will be equal the difference between the value in row Number 21 in column A and the value in row Number 25 in column A. I use the previous and the following rows for simplicity. I hope I could explain it.

Upvotes: 1

Answers (3)

PabTorre

Reputation: 3127

You need to use .shift on the column (series) where you want to run your calculation.

With shift(1) you get the previous row, with shift(-1) you get the next row.

from there you need to calculate previous - next

>>> s = pd.Series([4,2,2,3,2])
>>> s
0    4
1    2
2    2
3    3
4    2
dtype: int64
# previous 
>>> s.shift(1)
0    NaN
1    4.0
2    2.0
3    2.0
4    3.0
dtype: float64
# next 
>>> s.shift(-1)
0    2.0
1    2.0
2    3.0
3    2.0
4    NaN
dtype: float64
# previous - next 
>>> s.shift(1)-s.shift(-1)
0    NaN
1    2.0
2   -1.0
3    0.0
4    NaN
dtype: float64

Upvotes: 1

rafaelc

Reputation: 59284

Seems like you need a centered rolling window. You can specify that with the arg center=True

>>> df.A.rolling(3, center=True).apply(lambda s: s[0]-s[-1])

0    NaN
1    2.0
2   -1.0
3    0.0
4    NaN
Name: A, dtype: float64

This approach works for any window. Notice that this is a centered window, so the size of the window has to be N+N+1 (where N is the number of lookback and lookforward rows, and you add 1 to account for the value in the middle). Thus, the general formula is

window = 2*N + 1

If you need 2 rows before and 2 after, then N = 2. if you need 5 and 5, N=5 (and window = 11) etc. The apply lambda stays the same.

Upvotes: 2

Denziloe

Reputation: 8162

Let the series (i.e. DataFrame column) be s.

You want:

s.shift(1) - s.shift(-1)

Upvotes: 2

Compute difference between rows prior to and following to the specific row_pandas

Answers (3)

Related Questions