Reputation: 49
I have a dataframe "user1" with around 100000 rows of which some of the entries are as follows :
user timestamp x y z class
0 0x11 2018-04-10 12:37:51.252 -0.86 6.51 1.28 walk
1 0x11 2018-04-10 12:37:51.252 -3.16 9.52 2.98 walk
2 0x11 2018-04-10 12:37:51.324 -3.31 8.94 2.24 walk
Instead of full time, I want the difference between current time and the start time i.e suppose in 3rd row the timestamp is 2018-04-10 12:37:51.252
. I would like to change it to (2018-04-10 12:37:51.252 - 2018-04-10 12:37:51.324 = 0.72 secs)
. How to do this with pandas. All the other column entries should remain as it is.
Upvotes: 2
Views: 282
Reputation: 23099
ensuring your datetime column is a datetime is the first step,
import numpy as np
import pandas as pd
df['timestamp'] = pd.to_datetime(df['timestamp'])
next, we grab the earliest timestamp record per user with transform
the benefit of transform is that it allows one to use groupby
methods without modifying the index, very handy.
ts = df.groupby(['user'])['timestamp'].transform('min')
then return the delta as seconds to match your target output.
df['timedelta'] = (df['timestamp'] - ts) / np.timedelta64(1,'s')
print(df)
user timestamp x y z class timedelta
0 0x11 2018-04-10 12:37:51.252 -0.86 6.51 1.28 walk 0.000
1 0x11 2018-04-10 12:37:51.252 -3.16 9.52 2.98 walk 0.000
2 0x11 2018-04-10 12:37:51.324 -3.31 8.94 2.24 walk 0.072
Upvotes: 3
Reputation: 863801
If need first values per groups use DataFrame.transform
with GroupBy.first
, subtract by Series.sub
and convert timedeltas to seconds by Series.dt.total_seconds
:
df['timestamp'] = pd.to_datetime(df['timestamp'])
#if data are sorted
s = df.groupby('user')['timestamp'].transform('first')
#if data are not sorted and need minimal
#s = df.groupby('user')['timestamp'].transform('min')
df['diff'] = df['timestamp'].sub(s).dt.total_seconds()
print (df)
user timestamp x y z class diff
0 0x11 2018-04-10 12:37:51.252 -0.86 6.51 1.28 walk 0.000
1 0x11 2018-04-10 12:37:51.252 -3.16 9.52 2.98 walk 0.000
2 0x11 2018-04-10 12:37:51.324 -3.31 8.94 2.24 walk 0.072
EDIT:
If need grouping by user
and class
column:
df['timestamp'] = pd.to_datetime(df['timestamp'])
#if data are sorted
s = df.groupby(['user','class'])['timestamp'].transform('first')
#if data are not sorted and need minimal
#s = df.groupby(['user','class'])['timestamp'].transform('min')
df['diff'] = df['timestamp'].sub(s).dt.total_seconds()
print (df)
user timestamp x y z class diff
0 0x11 2018-04-10 12:37:51.252 -0.86 6.51 1.28 walk 0.000
1 0x11 2018-04-10 12:37:51.252 -3.16 9.52 2.98 walk 0.000
2 0x11 2018-04-10 12:37:51.324 -3.31 8.94 2.24 walk 0.072
Upvotes: 2