user13556745
user13556745

Reputation: 49

How to subtract every time from the start time

I have a dataframe "user1" with around 100000 rows of which some of the entries are as follows :

    user                    timestamp        x          y          z       class   
0      0x11      2018-04-10 12:37:51.252     -0.86      6.51        1.28        walk
1      0x11      2018-04-10 12:37:51.252     -3.16      9.52        2.98        walk
2      0x11      2018-04-10 12:37:51.324     -3.31      8.94        2.24        walk 

Instead of full time, I want the difference between current time and the start time i.e suppose in 3rd row the timestamp is 2018-04-10 12:37:51.252. I would like to change it to (2018-04-10 12:37:51.252 - 2018-04-10 12:37:51.324 = 0.72 secs). How to do this with pandas. All the other column entries should remain as it is.

Upvotes: 2

Views: 282

Answers (2)

Umar.H
Umar.H

Reputation: 23099

ensuring your datetime column is a datetime is the first step,

import numpy as np 
import pandas as pd
df['timestamp'] = pd.to_datetime(df['timestamp'])

next, we grab the earliest timestamp record per user with transform

the benefit of transform is that it allows one to use groupby methods without modifying the index, very handy.

ts = df.groupby(['user'])['timestamp'].transform('min')

then return the delta as seconds to match your target output.

df['timedelta'] = (df['timestamp'] - ts) / np.timedelta64(1,'s')

print(df)

   user               timestamp     x     y     z class  timedelta
0  0x11 2018-04-10 12:37:51.252 -0.86  6.51  1.28  walk      0.000
1  0x11 2018-04-10 12:37:51.252 -3.16  9.52  2.98  walk      0.000
2  0x11 2018-04-10 12:37:51.324 -3.31  8.94  2.24  walk      0.072

Upvotes: 3

jezrael
jezrael

Reputation: 863801

If need first values per groups use DataFrame.transform with GroupBy.first, subtract by Series.sub and convert timedeltas to seconds by Series.dt.total_seconds:

df['timestamp'] = pd.to_datetime(df['timestamp'])

#if data are sorted
s = df.groupby('user')['timestamp'].transform('first')
#if data are not sorted and need minimal
#s = df.groupby('user')['timestamp'].transform('min')
df['diff'] = df['timestamp'].sub(s).dt.total_seconds()
print (df)
   user               timestamp     x     y     z class   diff
0  0x11 2018-04-10 12:37:51.252 -0.86  6.51  1.28  walk  0.000
1  0x11 2018-04-10 12:37:51.252 -3.16  9.52  2.98  walk  0.000
2  0x11 2018-04-10 12:37:51.324 -3.31  8.94  2.24  walk  0.072

EDIT:

If need grouping by user and class column:

df['timestamp'] = pd.to_datetime(df['timestamp'])

#if data are sorted
s = df.groupby(['user','class'])['timestamp'].transform('first')
#if data are not sorted and need minimal
#s = df.groupby(['user','class'])['timestamp'].transform('min')
df['diff'] = df['timestamp'].sub(s).dt.total_seconds()
print (df)
   user               timestamp     x     y     z class   diff
0  0x11 2018-04-10 12:37:51.252 -0.86  6.51  1.28  walk  0.000
1  0x11 2018-04-10 12:37:51.252 -3.16  9.52  2.98  walk  0.000
2  0x11 2018-04-10 12:37:51.324 -3.31  8.94  2.24  walk  0.072

Upvotes: 2

Related Questions