Reputation: 603

Python pandas: how to do operations within a group?

I have the following dataframe:

df = pd.DataFrame(
    {
    "group": [1,1,1,2,2],
    "type": ["initial", "update", "update", "initial", "update"],
    "update time": ["2019-01-01 12:00:00", "2019-01-03 12:00:00", "2019-01-05 12:00:00", "2019-01-02 12:00:00", "2019-01-04 12:00:00"],
    "finish time": ["2019-01-07 12:00:00", "2019-01-07 12:00:00", "2019-01-08 12:00:00", "2019-01-05 12:00:00", "2019-01-05 12:00:00"]
    }
)

df["update time"] = pd.to_datetime(df["update time"])
df["finish time"] = pd.to_datetime(df["finish time"])

df

For every row, I want to calculate the difference between the 'finish time' and the 'update time' of the 'inital' row of each 'group'. As in the example, the 'finish time' can change.

The desired output is:

I guess that groupby is a good starting point, but I can't figure out the whole solution. Any ideas?

Thanks a lot!

Upvotes: 1

Answers (3)

anky

Reputation: 75080

Use:

df['finish time']-df.groupby('group')['update time'].transform('first')

Upvotes: 5

BENY

Reputation: 323226

We can using transform

df['finish time']-df.groupby('group')['update time'].transform('first')
Out[229]: 
0   6 days
1   6 days
2   7 days
3   3 days
4   3 days
dtype: timedelta64[ns]

Upvotes: 4

rafaelc

Reputation: 59264

Use transform('first') to broadcast to the same shape all first values of update time. Then, simple subtraction

df['finish time'] - df.groupby('group')['update time'].transform('first')

Upvotes: 5

Python pandas: how to do operations within a group?

Answers (3)

Related Questions