Julian
Julian

Reputation: 603

Python pandas: how to do operations within a group?

I have the following dataframe:

df = pd.DataFrame(
    {
    "group": [1,1,1,2,2],
    "type": ["initial", "update", "update", "initial", "update"],
    "update time": ["2019-01-01 12:00:00", "2019-01-03 12:00:00", "2019-01-05 12:00:00", "2019-01-02 12:00:00", "2019-01-04 12:00:00"],
    "finish time": ["2019-01-07 12:00:00", "2019-01-07 12:00:00", "2019-01-08 12:00:00", "2019-01-05 12:00:00", "2019-01-05 12:00:00"]
    }
)

df["update time"] = pd.to_datetime(df["update time"])
df["finish time"] = pd.to_datetime(df["finish time"])

df

enter image description here

For every row, I want to calculate the difference between the 'finish time' and the 'update time' of the 'inital' row of each 'group'. As in the example, the 'finish time' can change.

The desired output is:

enter image description here

I guess that groupby is a good starting point, but I can't figure out the whole solution. Any ideas?

Thanks a lot!

Upvotes: 1

Views: 93

Answers (3)

anky
anky

Reputation: 75080

Use:

df['finish time']-df.groupby('group')['update time'].transform('first')

Upvotes: 5

BENY
BENY

Reputation: 323226

We can using transform

df['finish time']-df.groupby('group')['update time'].transform('first')
Out[229]: 
0   6 days
1   6 days
2   7 days
3   3 days
4   3 days
dtype: timedelta64[ns]

Upvotes: 4

rafaelc
rafaelc

Reputation: 59264

Use transform('first') to broadcast to the same shape all first values of update time. Then, simple subtraction

df['finish time'] - df.groupby('group')['update time'].transform('first')

Upvotes: 5

Related Questions