Adding values to pandas DataFrame given condition

Question

I've been trying to find the most efficient way to do this. Say I have a DataFrame df1 that looks like:

   time_start  time_end    
0  1548102229  1548102232  
1  1548102239  1548102242 
2  1548102249  1548102252
3  1548102259  1548102262

and another DataFrame df2 that looks like:

   timestamp   state    
0  1548102231  A  
1  1548102241  A 
2  1548102248  B
3  1548102251  B

Is there a way to add 'state' to df1 given the condition of df2['timestamp'] being between df1['time_start'] and df1['time_end'] arriving to:

   time_start  time_end    state
0  1548102229  1548102232  A
1  1548102239  1548102242  A
2  1548102249  1548102252  N/A
3  1548102259  1548102262  B

BENY · Accepted Answer

Using IntervalIndex and get_indexer, then we assign back after .loc

idx=pd.IntervalIndex.from_arrays(df1['time_start'], df1['time_end'], closed='both')
indexmatch=idx.get_indexer(df2.timestamp)
df1['New']=df2.loc[indexmatch,'state'].values
df1
   time_start    time_end  New
0  1548102229  1548102232    A
1  1548102239  1548102242    A
2  1548102249  1548102252  NaN
3  1548102259  1548102262    B

Update

idx=pd.IntervalIndex.from_arrays(df1['time_start'], df1['time_end'], closed='both')
indexmatch=idx.get_indexer(df2.timestamp)
dfcopy=df1.copy()
df1=df1.loc[indexmatch]
df1['New']=df2.loc[indexmatch,'state'].values
df1.groupby(['time_start','time_end'],as_index=False).New.sum().combine_first(dfcopy)

Adding values to pandas DataFrame given condition

Answers (2)

Related Questions