Reputation: 79
I've been trying to find the most efficient way to do this. Say I have a DataFrame df1 that looks like:
time_start time_end
0 1548102229 1548102232
1 1548102239 1548102242
2 1548102249 1548102252
3 1548102259 1548102262
and another DataFrame df2 that looks like:
timestamp state
0 1548102231 A
1 1548102241 A
2 1548102248 B
3 1548102251 B
Is there a way to add 'state' to df1 given the condition of df2['timestamp'] being between df1['time_start'] and df1['time_end'] arriving to:
time_start time_end state
0 1548102229 1548102232 A
1 1548102239 1548102242 A
2 1548102249 1548102252 N/A
3 1548102259 1548102262 B
Upvotes: 2
Views: 89
Reputation: 323376
Using IntervalIndex
and get_indexer
, then we assign back after .loc
idx=pd.IntervalIndex.from_arrays(df1['time_start'], df1['time_end'], closed='both')
indexmatch=idx.get_indexer(df2.timestamp)
df1['New']=df2.loc[indexmatch,'state'].values
df1
time_start time_end New
0 1548102229 1548102232 A
1 1548102239 1548102242 A
2 1548102249 1548102252 NaN
3 1548102259 1548102262 B
Update
idx=pd.IntervalIndex.from_arrays(df1['time_start'], df1['time_end'], closed='both')
indexmatch=idx.get_indexer(df2.timestamp)
dfcopy=df1.copy()
df1=df1.loc[indexmatch]
df1['New']=df2.loc[indexmatch,'state'].values
df1.groupby(['time_start','time_end'],as_index=False).New.sum().combine_first(dfcopy)
Upvotes: 3
Reputation: 59284
Using np.less_equal
and np.greater_equal
outer
ufuncs
c = np.less_equal.outer(df2.timestamp, df.time_end) & \
np.greater_equal.outer(df2.timestamp, df.time_start)
df['state'] = df2.state.values[c.argmax(1)]
Then correct all False
results
df.loc[~c.any(1), 'state'] = np.nan
time_start time_end state
0 1548102229 1548102232 A
1 1548102239 1548102242 A
2 1548102249 1548102252 NaN
3 1548102259 1548102262 B
Upvotes: 0