Bobby Heyer
Bobby Heyer

Reputation: 601

Merge Dataframes on nearest large number

Problem

I've got two dataframes;

  1. seller_df, showing how much product is available for sale at a given price point
  2. buyer_df, showing how much market demand there is.

I want to merge seller_df['price'] column to buyer_df based on the marginal quantity value (i.e. nearest largest match)

What I've tried

I found this SO Link which got me close. however it finds the nearest number match, not the nearest larger number match.

Example

# Seller
seller_df = pd.DataFrame({
    'seller':    ['A',  'B',  'C',  'A',  'B',  'C'],
    'quantity':  [40,   30,   60,   30,   80,   900],
    'price':     [3.33, 3.50, 3.95, 4.25, 4.99, 5.25],
    'cum_avail': [40,   70,   130,  160,  240,  1140]
})

# Buyer
buyer_df = pd.DataFrame({
    'time':              ['t1', 't2', 't3', 't4', 't5', 't6', 't7', 't8', 't9', 't10'],
    'quantity_required': [26,   563,  120,  233,  11,   198,  756,  64,   116,  224]
})

# What I've tried, based on SO link
S = seller_df.assign(key=1)
B = buyer_df.assign(key=1)

merged_SB = pd.merge(S, B, on='key', suffixes=('_S', '_B'))
M = merged_SB.groupby('cum_avail').apply(lambda x: abs(x['cum_avail']-x['quantity_required']) == abs(x['cum_avail']-x['quantity_required']).min())

merged_SB = merged_SB[M.values].drop_duplicates().drop('key', axis=1)

Desired Output

  time  quantity_required  price
0   t1                 26   3.33
1   t2                563   5.25
2   t3                120   3.95
3   t4                233   4.99
4   t5                 11   3.33
5   t6                198   4.99
6   t7                756   5.25
7   t8                 64   3.50
8   t9                116   3.95
9  t10                224   4.99

Upvotes: 1

Views: 41

Answers (1)

Scott Boston
Scott Boston

Reputation: 153510

Let's use pd.merge_asof:

pd.merge_asof(B.sort_values('quantity_required'), S, 
              right_on='cum_avail', 
              left_on='quantity_required', direction='forward' )[['time', 'quantity_required', 'price']]

Output:

  time  quantity_required  price
0   t5                 11   3.33
1   t1                 26   3.33
2   t8                 64   3.50
3   t9                116   3.95
4   t3                120   3.95
5   t6                198   4.99
6  t10                224   4.99
7   t4                233   4.99
8   t2                563   5.25
9   t7                756   5.25

Upvotes: 3

Related Questions