Reputation: 331
I have a pandas dataframe of a pretty high length (100,000+), and I want to merge this dataframe with another based on closest element (by index) to a different dataframe object. For example, let us say that object 1 is the following:
DateTime x_1 y_1
2016-07-05 08:30:00 1 2
2016-07-05 08:30:01 3 4
2016-07-05 08:30:02 5 6
2016-07-05 08:30:03 7 8
2016-07-05 08:30:05 9 10
2016-07-05 08:30:11 11 12
and object 2 is the following:
DateTime x_2 y_2
2016-07-05 08:30:00 13 14
2016-07-05 08:30:05 15 16
2016-07-05 08:30:05 17 18
2016-07-05 08:30:10 19 20
I want to return a new object, object3, which has the same length as object2, and now contains items from object1 which are closest to the times of object2. Object3, in this case, would look like this:
DateTime x_2 y_2 x_1 y_1
2016-07-05 08:30:00 13 14 1 2
2016-07-05 08:30:05 15 16 9 10
2016-07-05 08:30:05 17 18 9 10
2016-07-05 08:30:10 19 20 11 12
I currently do something along these lines to get this to work, but I get reindex errors (non-unique index):
object3 = object2.join(object1.reindex(object2.index, method='nearest'), how='left')
which returns a valueerror:
ValueError: cannot reindex a non-unique index with a method or limit
Upvotes: 2
Views: 903
Reputation: 33783
Do the reindex
on the unique values of the other index. You'll pick up the duplicates during the join.
object3 = object2.join(object1.reindex(object2.index.unique(), method='nearest'))
The resulting output:
x_2 y_2 x_1 y_1
DateTime
2016-07-05 08:30:00 13 14 1 2
2016-07-05 08:30:05 15 16 9 10
2016-07-05 08:30:05 17 18 9 10
2016-07-05 08:30:10 19 20 11 12
Upvotes: 3