00__00__00
00__00__00

Reputation: 5357

keep both merging keys after pandas.merge_asof

I have found this nice function pandas.merge_asof. From the documentation

pandas.merge_asof(left, right, on=None, left_on=None, right_on=None)

Parameters: 

left : DataFrame
right : DataFrame
on : label

Field name to join on. Must be found in both DataFrames.
The data MUST be ordered. 
Furthermore this must be a numeric column,such as datetimelike, integer, or float. 
On or left_on/right_on must be given.

and it works as expected.

However, my merged dataframe keeps as columns on only the one that originally was in left. I would need to keep them both, so to have

   mydf=pandas.merge_asof(left, right, on='Time')

and mydf to contain both Time from left and right

Example data:

a=pd.DataFrame(data=pd.date_range('20100201', periods=100, freq='6h3min'),columns=['Time'])
b=pd.DataFrame(data=
                  pd.date_range('20100201', periods=24, freq='1h'),columns=['Time'])
b['val']=range(b.shape[0])
out=pd.merge_asof(a,b,on='Time',direction='forward',tolerance=pd.Timedelta('30min'))

Upvotes: 22

Views: 8718

Answers (2)

Mohit Lunia
Mohit Lunia

Reputation: 21

One trick that I found while working on time-series data is this function. To articulate what I have done is perform an outer join on both dataframes. Suppose the entry against some "Date and Time" is not in the other dataframe. In that case, we copied the value it had previously, as I assumed that the value in the other dataframe stayed whatever it was before that time.

def merger(df1,df2):
    
    df1['Date and Time'] = pd.to_datetime(df1['Date and Time'], errors='coerce')
    df2['Date and Time'] = pd.to_datetime(df2['Date and Time'], errors='coerce') 
   

    df = pd.merge(df1,df2,on = 'Date and Time', how='outer')
    df.fillna(method = 'ffill',inplace = True)
    
    return df

Hope it Helps.

Upvotes: 2

jezrael
jezrael

Reputation: 863341

I think one possible solution is rename columns:

out = pd.merge_asof(a.rename(columns={'Time':'Time1'}), 
                    b.rename(columns={'Time':'Time2'}), 
                    left_on='Time1',
                    right_on='Time2',
                    direction='forward',
                    tolerance=pd.Timedelta('30min'))

print (out.head())
                Time1      Time2  val
0 2010-02-01 00:00:00 2010-02-01  0.0
1 2010-02-01 06:03:00        NaT  NaN
2 2010-02-01 12:06:00        NaT  NaN
3 2010-02-01 18:09:00        NaT  NaN
4 2010-02-02 00:12:00        NaT  NaN

Upvotes: 24

Related Questions