Reputation: 161
understand that there is many similar question out there on subtraction for pandas column, but my scenario is rather unique
I have 2 sets of data which both includes datetime. My program requires me to match this 2 date time therefore I used merge_asof to match to the closest time. After matching the data, I will need to find the time difference between this 2 set of data.
EndTime,Datetime
3/10/2010 0:00:33, 3/10/2010 0:00:26
3/10/2010 0:01:15,
3/10/2010 0:01:30,
3/10/2010 0:02:09, 3/10/2010 0:01:36
3/10/2010 0:02:50,
3/10/2010 0:05:09,
3/10/2010 0:06:00, 3/10/2010 0:05:48
The UNIQUE part will be that I would like to use Datetime subtract 1row before of endtime, (e.g. last row 3/10/2010 0:05:48 - 2nd last row 3/10/2010 0:05:09 = 39seconds)
Expected outcome would be:
EndTime,Datetime,SecondsDiff
3/10/2010 0:00:33, 3/10/2010 0:00:26, *not sure how to compute this but not important for 1st row of data*
3/10/2010 0:01:15,,
3/10/2010 0:01:30,,
3/10/2010 0:02:09, 3/10/2010 0:01:36,6
3/10/2010 0:02:50,,
3/10/2010 0:05:09,,
3/10/2010 0:06:00, 3/10/2010 0:05:48,39
I've tried some methods that caused some error, please advice!
As when I matched data, it cause duplicate so I uses
dm.loc[(dm['Datetime'].notnull())&(dm.duplicated('Datetime')==True),'Datetime'] = ' '
to give empty space that doesnt cause duplicate but it give me "Error parsing datetime string " " at position 1"
-
I've also tried indicating the duplicates as 0 and converting the datetime type but causes the
error ufunc subtract cannot use operands with types dtype('O') anddtype('<M8[ns]')
, therefore I tried the below method but resulting to my datetime column becoming only date and the seconds difference was totally off
dm.loc[(dm['Datetime'].notnull())&(dm.duplicated('Datetime')==True),'Datetime'] = 0
dm['EndTime'] = dm['EndTime'].values.astype('datetime64')
dm['Datetime'] = dm['Datetime'].values.astype('datetime64')
dm['Seconds'] = (dm.Datetime -
dm.EndTime.shift(-1)).astype('timedelta64[s]')
Upvotes: 2
Views: 228
Reputation: 4233
IIUC you can use :
print (df)
EndTime Datetime
0 3/10/2010 0:00:33 3/10/2010 0:00:26
1 3/10/2010 0:01:15
2 3/10/2010 0:01:30 NaN
3 3/10/2010 0:02:09 3/10/2010 0:01:36
4 3/10/2010 0:02:50 NaN
5 3/10/2010 0:05:09 NaN
6 3/10/2010 0:06:00 3/10/2010 0:05:48
df.Datetime = pd.to_datetime(df.Datetime, errors='coerce')
df.EndTime = pd.to_datetime(df.EndTime, errors='coerce')
df['SecondsDiff'] = df.Datetime.sub(df.EndTime.shift()).dt.seconds
print (df)
EndTime Datetime SecondsDiff
0 2010-03-10 00:00:33 2010-03-10 00:00:26 NaN
1 2010-03-10 00:01:15 NaT NaN
2 2010-03-10 00:01:30 NaT NaN
3 2010-03-10 00:02:09 2010-03-10 00:01:36 6.0
4 2010-03-10 00:02:50 NaT NaN
5 2010-03-10 00:05:09 NaT NaN
6 2010-03-10 00:06:00 2010-03-10 00:05:48 39.0
Upvotes: 1