Reputation: 12940
On Pandas Dataframe, we can compute difference between values on different rows of a given column (or different columns of a given rows): https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.diff.html
How can we compute a series that correspond to the difference of values of 2 different columns in consecutive rows.
For example, if we have a start date and an end date, how can we compute the time between the end date of row N and the start date of row N+1?
>>> df = pd.DataFrame({"Start": pd.date_range("2020-01-01", "2020-01-07", freq="2D"), "End":pd.date_range("2020-01-02", "2020-01-11", freq="3D")})
>>> df
Start End
0 2020-01-01 2020-01-02
1 2020-01-03 2020-01-05
2 2020-01-05 2020-01-08
3 2020-01-07 2020-01-11
# and then we would call something like that
>>> delta = df.ndiff(columns=["End", "Start"])
0 NaT
1 1 days
2 0 days
3 -1 days
dtype: timedelta64[ns]
Upvotes: 0
Views: 37
Reputation: 12940
Elaboration on the answer from @wwnde (put as comment)
The idea is to compare rows after a shift:
>>> df['Start'] - df['End'].shift()
0 NaT
1 1 days
2 0 days
3 -1 days
dtype: timedelta64[ns]
We can also use the column names as attributes:
>>> df.Start - df.End.shift()
In other words, the function s.diff(periods)
(same for a Dataframe) is equivalent to s - s.shift(periods)
.
Upvotes: 1