Reputation: 1538
I have two Series which are pd.Timestamps, and they are extremely close. I'd like to get the elementwise difference between the two Series, but with nanosecond precision.
First Series:
0 2021-05-21 00:02:11.349001429
1 2021-05-21 00:02:38.195857153
2 2021-05-21 00:03:25.527530228
3 2021-05-21 00:03:26.653410069
4 2021-05-21 00:03:26.798157366
Second Series:
0 2021-05-21 00:02:11.348997322
1 2021-05-21 00:02:38.195852267
2 2021-05-21 00:03:25.527526087
3 2021-05-21 00:03:26.653406759
4 2021-05-21 00:03:26.798154350
Now if I simply use the -
operator, I will truncate the nanoseconds difference. It will show something like this:
Series1 - Series2
0 00:00:00.000004
1 00:00:00.000004
2 00:00:00.000004
3 00:00:00.000003
4 00:00:00.000003
I don't want to lose the nanosecond precision when calculating the differences between Timestamps. I have hacked up a solution that involves doing a for loop over each row, and calculating the scalar difference in pd.Timedelta, then getting the microseconds and nanoseconds out of that. Like this (for the first element):
single_diff = Series1[0] - Series2[0]
single_diff.microseconds * 1000 + single_diff.nanoseconds
4107
Is there a neater vectorized way to do this, instead of a for loop?
Upvotes: 2
Views: 1000
Reputation: 715
You can also get the nanosecond without numpy
, like this
import pandas as pd
s1 = pd.Series(
pd.to_datetime(
[
"2021-05-21 00:02:11.349001429",
"2021-05-21 00:02:38.195857153",
"2021-05-21 00:03:25.527530228",
"2021-05-21 00:03:26.653410069",
"2021-05-21 00:03:26.798157366",
]
)
)
s2 = pd.Series(
pd.to_datetime(
[
"2021-05-21 00:02:11.348997322",
"2021-05-21 00:02:38.195852267",
"2021-05-21 00:03:25.527526087",
"2021-05-21 00:03:26.653406759",
"2021-05-21 00:03:26.798154350",
]
)
)
# before pandas 1.5.0
(s1 - s2 ).apply(lambda x: x.delta)
# 0 4107
# 1 4886
# 2 4141
# 3 3310
# 4 3016
# dtype: int64
# since pandas 1.5.0
(S1 - S2).apply(lambda x: x.value)
# 0 4107
# 1 4886
# 2 4141
# 3 3310
# 4 3016
# dtype: int64
Upvotes: 1
Reputation: 25564
You won't lose precision if you work with timedelta as shown. The internal representation is always nanoseconds. After calculating the timedelta, you can convert to integer to obtain the difference in nanoseconds. Ex:
import pandas as pd
import numpy as np
s1 = pd.Series(pd.to_datetime(["2021-05-21 00:02:11.349001429",
"2021-05-21 00:02:38.195857153",
"2021-05-21 00:03:25.527530228",
"2021-05-21 00:03:26.653410069",
"2021-05-21 00:03:26.798157366"]))
s2 = pd.Series(pd.to_datetime(["2021-05-21 00:02:11.348997322",
"2021-05-21 00:02:38.195852267",
"2021-05-21 00:03:25.527526087",
"2021-05-21 00:03:26.653406759",
"2021-05-21 00:03:26.798154350"]))
delta = (s1-s2).astype(np.int64)
delta
0 4107
1 4886
2 4141
3 3310
4 3016
dtype: int64
Note: I'm using numpy's int64 type here since on some systems, the built-in int
will result in 32-bit integers, i.e. the conversion fails.
Upvotes: 2