user3854325
user3854325

Reputation: 69

add timedelta data within a group in pandas dataframe

I am working on a dataframe in pandas with four columns of user_id, time_stamp1, time_stamp2, and interval. Time_stamp1 and time_stamp2 are of type datetime64[ns] and interval is of type timedelta64[ns].

I want to sum up interval values for each user_id in the dataframe and I tried to calculate it in many ways as:

1)df["duration"]=   df.groupby('user_id')['interval'].apply (lambda x: x.sum())

2)df ["duration"]=  df.groupby('user_id').aggregate (np.sum)

3)df ["duration"]=  df.groupby('user_id').agg (np.sum)

but none of them work and the value of the duration will be NaT after running the codes.

Upvotes: 4

Views: 3296

Answers (1)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210832

UPDATE: you can use transform() method:

In [291]: df['duration'] = df.groupby('user_id')['interval'].transform('sum')

In [292]: df
Out[292]:
                    a  user_id                   b          interval           duration
0 2016-01-01 00:00:00     0.01 2015-11-11 00:00:00  51 days 00:00:00  838 days 08:00:00
1 2016-03-10 10:39:00     0.01 2015-12-08 18:39:00               NaT  838 days 08:00:00
2 2016-05-18 21:18:00     0.01 2016-01-05 13:18:00 134 days 08:00:00  838 days 08:00:00
3 2016-07-27 07:57:00     0.01 2016-02-02 07:57:00 176 days 00:00:00  838 days 08:00:00
4 2016-10-04 18:36:00     0.01 2016-03-01 02:36:00 217 days 16:00:00  838 days 08:00:00
5 2016-12-13 05:15:00     0.01 2016-03-28 21:15:00 259 days 08:00:00  838 days 08:00:00
6 2017-02-20 15:54:00     0.02 2016-04-25 15:54:00 301 days 00:00:00 1454 days 00:00:00
7 2017-05-01 02:33:00     0.02 2016-05-23 10:33:00 342 days 16:00:00 1454 days 00:00:00
8 2017-07-09 13:12:00     0.02 2016-06-20 05:12:00 384 days 08:00:00 1454 days 00:00:00
9 2017-09-16 23:51:00     0.02 2016-07-17 23:51:00 426 days 00:00:00 1454 days 00:00:00

OLD answer:

Demo:

In [260]: df
Out[260]:
                    a                   b          interval  user_id
0 2016-01-01 00:00:00 2015-11-11 00:00:00  51 days 00:00:00        1
1 2016-03-10 10:39:00 2015-12-08 18:39:00               NaT        1
2 2016-05-18 21:18:00 2016-01-05 13:18:00 134 days 08:00:00        1
3 2016-07-27 07:57:00 2016-02-02 07:57:00 176 days 00:00:00        1
4 2016-10-04 18:36:00 2016-03-01 02:36:00 217 days 16:00:00        1
5 2016-12-13 05:15:00 2016-03-28 21:15:00 259 days 08:00:00        1
6 2017-02-20 15:54:00 2016-04-25 15:54:00 301 days 00:00:00        2
7 2017-05-01 02:33:00 2016-05-23 10:33:00 342 days 16:00:00        2
8 2017-07-09 13:12:00 2016-06-20 05:12:00 384 days 08:00:00        2
9 2017-09-16 23:51:00 2016-07-17 23:51:00 426 days 00:00:00        2

In [261]: df.dtypes
Out[261]:
a            datetime64[ns]
b            datetime64[ns]
interval    timedelta64[ns]
user_id               int64
dtype: object

In [262]: df.groupby('user_id')['interval'].sum()
Out[262]:
user_id
1    838 days 08:00:00
2   1454 days 00:00:00
Name: interval, dtype: timedelta64[ns]

In [263]: df.groupby('user_id')['interval'].apply(lambda x: x.sum())
Out[263]:
user_id
1    838 days 08:00:00
2   1454 days 00:00:00
Name: interval, dtype: timedelta64[ns]

In [264]: df.groupby('user_id').agg(np.sum)
Out[264]:
                  interval
user_id
1        838 days 08:00:00
2       1454 days 00:00:00

So check your data...

Upvotes: 1

Related Questions