Reputation: 305
I have a pandas dataframe that holds user ids and datetimes for certain events. Now I want to get the relative time that each event happened with respect to the first event of that specific user.
user_id date
A 2016-03-02 18:15:43
A 2016-01-10 17:58:57
B 2017-03-22 07:52:00
B 2017-03-27 10:41:00
I found a solution that works but seems way too much effort. I believe there is a much more elegant way to do this.
#get earliest datetime per user
start = lambda x: x.min()
start.__name__ = 'start_date'
min_dates = df.groupby('user_id').agg({'date':[start]})
#merge back to dataframe
df = df.join(min_dates.date['start_date'])
#calulate relative time
df['time_after_start'] = (df['date']-df['start_date']).apply(lambda x: x.days+x.seconds/(24*60*60))
The expected result looks like this
user_id date time_after_start
A 2017-03-22 07:52:00 0.000000
A 2017-03-27 10:41:00 5.117361
B 2016-03-02 18:15:43 52.011644
B 2016-01-10 17:58:57 0.000000
Thanks so much for your help!
Upvotes: 1
Views: 27
Reputation: 862591
Use GroupBy.transform
with min
for Series
with same size like original DataFrame
, then convert timedeltas by Series.dt.total_seconds
and divide for day
s:
s = df.groupby('user_id')['date'].transform('min')
df['time_after_start'] = (df['date']-s).dt.total_seconds()/(24*60*60)
print (df)
user_id date time_after_start
0 A 2016-03-02 18:15:43 52.011644
1 A 2016-01-10 17:58:57 0.000000
2 B 2017-03-22 07:52:00 0.000000
3 B 2017-03-27 10:41:00 5.117361
Upvotes: 2