Reputation: 603
I have the following dataframe:
df = pd.DataFrame(
{
"group": [1,1,1,2,2],
"type": ["initial", "update", "update", "initial", "update"],
"update time": ["2019-01-01 12:00:00", "2019-01-03 12:00:00", "2019-01-05 12:00:00", "2019-01-02 12:00:00", "2019-01-04 12:00:00"],
"finish time": ["2019-01-07 12:00:00", "2019-01-07 12:00:00", "2019-01-08 12:00:00", "2019-01-05 12:00:00", "2019-01-05 12:00:00"]
}
)
df["update time"] = pd.to_datetime(df["update time"])
df["finish time"] = pd.to_datetime(df["finish time"])
df
For every row, I want to calculate the difference between the 'finish time' and the 'update time' of the 'inital' row of each 'group'. As in the example, the 'finish time' can change.
The desired output is:
I guess that groupby
is a good starting point, but I can't figure out the whole solution. Any ideas?
Thanks a lot!
Upvotes: 1
Views: 93
Reputation: 75080
Use:
df['finish time']-df.groupby('group')['update time'].transform('first')
Upvotes: 5
Reputation: 323226
We can using transform
df['finish time']-df.groupby('group')['update time'].transform('first')
Out[229]:
0 6 days
1 6 days
2 7 days
3 3 days
4 3 days
dtype: timedelta64[ns]
Upvotes: 4
Reputation: 59264
Use transform('first')
to broadcast to the same shape all first values of update time
. Then, simple subtraction
df['finish time'] - df.groupby('group')['update time'].transform('first')
Upvotes: 5