Pandas — match last identical row and compute difference

Question

With a DataFrame like the following:

         timestamp    value
0       2012-01-01      3.0
1       2012-01-05      3.0
2       2012-01-06      6.0
3       2012-01-09      3.0
4       2012-01-31      1.0
5       2012-02-09      3.0
6       2012-02-11      1.0
7       2012-02-13      3.0
8       2012-02-15      2.0
9       2012-02-18      5.0

What would be an elegant and efficient way to add a time_since_last_identical column, so that the previous example would result in:

         timestamp    value   time_since_last_identical
0       2012-01-01      3.0                         NaT
1       2012-01-05      3.0                      5 days
2       2012-01-06      6.0                         NaT
3       2012-01-09      3.0                      4 days
4       2012-01-31      1.0                         NaT
5       2012-02-09      3.0                     31 days
6       2012-02-11      1.0                     10 days
7       2012-02-13      3.0                      4 days
8       2012-02-15      2.0                         NaT
9       2012-02-18      5.0                         NaT

The important part of the problem is not necessarily the usage of time delays. Any solution that matches one particular row with the previous row of identical value, and computes something out of those two rows (here, a difference) will be valid.

Note: not interested in apply or loop-based approaches.

VinceP · Accepted Answer

A simple, clean and elegant groupby will do the trick:

df['time_since_last_identical'] = df.groupby('value').diff()

Gives:

   timestamp  value  time_since_last_identical
0 2012-01-01    3.0                        NaT
1 2012-01-05    3.0                     4 days
2 2012-01-06    6.0                        NaT
3 2012-01-09    3.0                     4 days  
4 2012-01-31    1.0                        NaT
5 2012-02-09    3.0                    31 days
6 2012-02-11    1.0                    11 days
7 2012-02-13    3.0                     4 days
8 2012-02-15    2.0                        NaT
9 2012-02-18    5.0                        NaT

Pandas — match last identical row and compute difference

Answers (2)

Related Questions