Srirekha Srirekha
Srirekha Srirekha

Reputation: 83

merging two pandas dataframes on nearest time stamp

I have two daframes df1 and df2

df1 is

time                  status
2/2/2015 8.00 am      on time
2/2/2015 9.00 am      canceled
2/2/2015 10.30 am     on time
2/2/2015 12.45 pm     on time

df2 is

 w_time                 temp
 2/2/2015 8.00 am      45
 2/2/2015 8.50 am      46
 2/2/2015 9.40 am      47
 2/2/2015 10.15 am     47
 2/2/2015 10.35 am     48
 2/2/2015 12.00 pm     48
 2/2/2015 1.00 pm      49

Now i want merge two data frames in such way that the second time stamp is always closer or equal to the first timestamp

the result should be

time              status     w_time              temp

2/2/2015 8.00 am  on time    2/2/2015 8.00 am     45

2/2/2015 9.00 am  canceled   2/2/2015 8.50 am     46

2/2/2015 10.30 am   on time    2/2/2015 10.35 am   48
2/2/2015 12.45 pm   on time    2/2/2015 1.00 pm    49

Upvotes: 5

Views: 7043

Answers (1)

Andy Hayden
Andy Hayden

Reputation: 375377

First ensure that the date columns are datetime64 columns.

df1['time'] = pd.to_datetime(df1['time'].str.replace(".", ":"))
df2['w_time'] = pd.to_datetime(df2['w_time'].str.replace(".", ":"))

If you set these as DatetimeIndexs can then use reindex with the 'nearest' method:

In [11]: df1 = df1.set_index("time")

In [12]: df2 = df2.set_index("w_time", drop=False)

In [13]: df1
Out[13]:
                       status
time
2015-02-02 08:00:00   on time
2015-02-02 09:00:00  canceled
2015-02-02 10:30:00   on time
2015-02-02 12:45:00   on time

In [14]: df2
Out[14]:
                     temp              w_time
w_time
2015-02-02 08:00:00    45 2015-02-02 08:00:00
2015-02-02 08:50:00    46 2015-02-02 08:50:00
2015-02-02 09:40:00    47 2015-02-02 09:40:00
2015-02-02 10:15:00    47 2015-02-02 10:15:00
2015-02-02 10:35:00    48 2015-02-02 10:35:00
2015-02-02 12:00:00    48 2015-02-02 12:00:00
2015-02-02 13:00:00    49 2015-02-02 13:00:00

With the following:

In [15]: df2.reindex(df1.index, method='nearest')
Out[15]:
                     temp              w_time
time
2015-02-02 08:00:00    45 2015-02-02 08:00:00
2015-02-02 09:00:00    46 2015-02-02 08:50:00
2015-02-02 10:30:00    48 2015-02-02 10:35:00
2015-02-02 12:45:00    49 2015-02-02 13:00:00

Then add these columns/join back to df1.

Upvotes: 8

Related Questions