Reputation: 214
I'm hoping to subset a df using specific timestamps plus an additional period of time. Using below, df
contains specific timestamps that I want to use to subset df2
. Essentially, I use the timestamps in df and determine the previous minute. These periods of time are then used to create individual df's, which are concatenated together to create the final df.
However, this is inefficient by itself, but becomes even more so when dealing with multiple times.
import pandas as pd
df = pd.DataFrame({
'Time' : ['2020-08-02 10:01:12.5','2020-08-02 11:01:12.5','2020-08-02 12:31:00.0','2020-08-02 12:41:22.6'],
'ID' : ['X','Y','B','X'],
})
# 1 min before timestamp
'2020-08-02 10:00:12.5'
# first timestamp
'2020-08-02 10:01:12.5'
# 1 min before timestamp
'2020-08-02 11:00:02.1'
# second timestamp
'2020-08-02 11:01:02.1'
df2 = pd.DataFrame({
'Time' : ['2020-08-02 10:00:00.1','2020-08-02 10:00:00.2','2020-08-02 10:00:00.3','2020-08-02 10:00:00.4'],
'ID' : ['','','',''],
})
d1 = df2[(df2['Time'] > '2020-08-02 10:00:12.5') & (df2['Time'] <= '2020-08-02 10:01:12.5')]
d2 = df2[(df2['Time'] > '2020-08-02 11:00:02.1') & (df2['Time'] <= '2020-08-02 11:01:02.1')]
df_out = pd.concat([d1,d2])#...include all separate periods of time
Intended Output:
Time ID
2020-08-02 10:00:12.5
2020-08-02 10:00:12.6
...
2020-08-02 11:01:12.5 X
2020-08-02 11:00:02.1
2020-08-02 11:00:02.2
...
2020-08-02 11:01:02.1 Y
Upvotes: 1
Views: 211
Reputation: 9941
There's the merge_asof
method in pandas that does just that.
Let me use slightly different timestamps compared to the original post to make it a bit easier to illustrate. I'll set df1
timestamps at 10:01
, 10:03
and 10:06
for the purpose of this example.
Let's add 1MinBefore
column to df
with the timestamp one minute before the Time
(we'll use it later to merge the dataframes):
df = pd.DataFrame({
'Time' : ['2020-08-02 10:01:00','2020-08-02 10:03:00','2020-08-02 10:06:00'],
'ID' : ['X','Y','Z'],
})
df['Time'] = pd.to_datetime(df['Time'])
df['1MinBefore'] = df['Time'] - pd.Timedelta('1min')
So our df
is:
Time ID 1MinBefore
0 2020-08-02 10:01:00 X 2020-08-02 10:00:00
1 2020-08-02 10:03:00 Y 2020-08-02 10:02:00
2 2020-08-02 10:06:00 Z 2020-08-02 10:05:00
Let's use the range between 10:00
and 10:07
with 30 second intervals for df2
:
df2 = pd.DataFrame({
'Time' : pd.date_range(
start='2020-08-02 10:00:00',
end='2020-08-02 10:07:00',
freq='30s'),
'ID' : '',
})
And now the key step, merging these dataframes with merge_asof
:
pd.merge_asof(df2[['Time']], df[['ID', '1MinBefore']],
left_on='Time', right_on='1MinBefore',
tolerance=pd.Timedelta('1min')
Output:
Time ID 1MinBefore
0 2020-08-02 10:00:00 X 2020-08-02 10:00:00
1 2020-08-02 10:00:30 X 2020-08-02 10:00:00
2 2020-08-02 10:01:00 X 2020-08-02 10:00:00
3 2020-08-02 10:01:30 NaN NaT
4 2020-08-02 10:02:00 Y 2020-08-02 10:02:00
5 2020-08-02 10:02:30 Y 2020-08-02 10:02:00
6 2020-08-02 10:03:00 Y 2020-08-02 10:02:00
7 2020-08-02 10:03:30 NaN NaT
8 2020-08-02 10:04:00 NaN NaT
9 2020-08-02 10:04:30 NaN NaT
10 2020-08-02 10:05:00 Z 2020-08-02 10:05:00
11 2020-08-02 10:05:30 Z 2020-08-02 10:05:00
12 2020-08-02 10:06:00 Z 2020-08-02 10:05:00
13 2020-08-02 10:06:30 NaN NaT
14 2020-08-02 10:07:00 NaN NaT
The tolerance
parameter of 1 minute basically tells it that values in df
'older' than 1 minute should be disregarded.
Now we can of course drop 1MinBefore
column and use fillna
on the ID
column to make it look exactly like the Intended Output
in the original post.
Upvotes: 2