Reputation: 145
I have two Pandas dataframes that I need to merge. The first is a long-form data set containing my selling prices for items at various quantity breaks. Prices decrease as the quantity of parts being purchased increases.
Dataframe1
PART# MY_QTY MY_PRC
Item1 1 $20
Item1 10 $18
Item1 20 $17
Item2 1 $120
Item2 30 $100
Item2 50 $95
The second is a wide-form data set containing multiple vendor's quantity breaks and selling prices. For Item1 below if I buy 1 piece from Vend1 I pay $10, 4 pcs is still $10, 5pcs is $8, etc. The number of quantity breaks varies by item and by vendor, and not all vendors sell all items.
Dataframe2
PART# VEND# QTY1 PRC1 QTY2 PRC2 QTY3 PRC3
Item1 Vend1 1 $10 5 $8 15 $7
Item1 Vend2 1 $15 11 $12 30 $11
Item1 Vend3 1 $20 10 $18
Item2 Vend1 1 $75 20 $60 30 $55
Item2 Vend2 1 $80 12 $70
I want to merge the dataframes so that I can compare my selling price at each of my quantity breaks with the vendor costs at the same quantities. The final dataframe will have the shape of a left merge on PART# with VEND#'s pivoted to columns.
The part I'm having difficulty with is grabbing the correct vendor price based on MY_QTY. I should be able to read across a row and see what all parties are charging for an item at the given quantity. Expected output below.
Result Dataframe
PART# MY_QTY MY_PRC VEND1 VEND2 VEND3
Item1 1 $20 $10 $15 $20
Item1 10 $18 $8 $15 $18
Item1 20 $17 $7 $12 $18
Item2 1 $120 $75 $80
Item2 30 $100 $55 $70
Item2 50 $95 $55 $70
EDIT
People seem to be getting confused with Dataframe2. This dataframe is read row-wise. The first row of values shows prices for Item1 sold by Vend1. For this row from QTY1 (1 piece) up to QTY2 (5 pieces) the price is PRC1 ($10), then from QTY2 (5 pieces) up to QTY3 (15 pieces) the price is PRC2 ($8). The price stays the same until the quantity requested hits the next quantity break.
Say Mama's Farm Stand is selling apples for $1 each. If you buy 5 apples then the price per apple drops to $.75. If you buy 15 apples then the price drops again to $.50. The dataframe for this example would look like below.
PART# VEND# QTY1 PRC1 QTY2 PRC2 QTY3 PRC3
Apple Mama 1 $1 5 $.75 15 $.5
Upvotes: 1
Views: 361
Reputation: 645
Here's another way that only uses loops over the vendor, but requires the data to be sorted
import pandas as pd
from io import StringIO
import numpy as np
df1_t = StringIO("""PART#,MY_QTY,MY_PRC
Item1,1,$20
Item1,10,$18
Item1,20,$17
Item2,1,$120
Item2,30,$100
Item2,50,$95
""")
df2_t = StringIO("""PART#,VEND#,QTY1,PRC1,QTY2,PRC2,QTY3,PRC3
Item1,Vend1,1,$10,5,$8,15,$7
Item1,Vend2,1,$15,11,$12,30,$11
Item1,Vend3,1,$20,10,$18
Item2,Vend1,1,$75,20,$60,30,$55
Item2,Vend2,1,$80,12,$70
""")
df1 = pd.read_csv(df1_t)
df2 = pd.read_csv(df2_t)
df2 = pd.wide_to_long(df2, ['QTY','PRC'], i='VEND#', j='v').set_index('QTY',append=True).reset_index().drop('v',
axis=1)
df1['MY_QTY'] = df1['MY_QTY'].astype(float)
df1 = df1.sort_values(by="MY_QTY")
df2 = df2.sort_values(by="QTY")
df2 = df2.dropna(axis=0, how='any')
vendors = df2['VEND#'].unique()
df3=df1
for vendor in vendors:
df3 = pd.merge_asof(df3, df2[df2['VEND#']==vendor], left_on="MY_QTY", right_on="QTY", by='PART#',suffixes=('', '_y'))
to_drop = [x for x in df3 if x.startswith('VEND')]
to_drop = to_drop + [x for x in df3 if x.startswith('QTY')]
df3.drop(to_drop, axis=1, inplace=True)
df3 = df3.rename(columns={prc : vendor for prc, vendor in zip([x for x in df3 if x.startswith('PRC')], vendors)})
print(df3)
# PART# MY_QTY MY_PRC Vend1 Vend3 Vend3
#0 Item1 1.0 $20 $10 $15 $20
#1 Item2 1.0 $120 $75 $80 NaN
#2 Item1 10.0 $18 $8 $15 $18
#3 Item1 20.0 $17 $7 $12 $18
#4 Item2 30.0 $100 $55 $70 NaN
#5 Item2 50.0 $95 $55 $70 NaN
Upvotes: 1
Reputation: 645
Here's a working example of how you might do it. This is by no means efficient. Other people seemed to be trying to join the two datasets, but it sounds like what you wanted was actually getting the price of the the largest QTY <= MY_QTY
for each vendor / part combination.
import pandas as pd
from io import StringIO
import numpy as np
df1_t = StringIO("""PART#,MY_QTY,MY_PRC
Item1,1,$20
Item1,10,$18
Item1,20,$17
Item2,1,$120
Item2,30,$100
Item2,50,$95
""")
df2_t = StringIO("""PART#,VEND#,QTY1,PRC1,QTY2,PRC2,QTY3,PRC3
Item1,Vend1,1,$10,5,$8,15,$7
Item1,Vend2,1,$15,11,$12,30,$11
Item1,Vend3,1,$20,10,$18
Item2,Vend1,1,$75,20,$60,30,$55
Item2,Vend2,1,$80,12,$70
""")
df1 = pd.read_csv(df1_t)
df2 = pd.read_csv(df2_t)
vendors = df2['VEND#'].unique()
items = df2['PART#'].unique()
# for the specific item and vendor in the rows of Dataframe1 (df1), find the
# largest QTY for that that's less than MY_QTY for the same combination of item
# and vendor in df2
def find_price(row, vendor, df2):
item = row['PART#']
quantity = row['MY_QTY']
# get the row with that specific item / vendor combo
prices = df2[(df2['PART#']==item) & (df2['VEND#']==vendor)]
# reshape a little
prices = pd.wide_to_long(prices, ['QTY','PRC'], i='VEND#', j='v').set_index('QTY',append=True).reset_index().drop('v',axis=1)
# only get where QTY <= MY_QTY
prices = prices[prices['QTY']<=quantity]
if prices.empty:
return np.nan
else:
return prices.loc[prices['QTY'].argmax(),:]['PRC']
# iterate throw the vendors, and use find_price to get the corresponding price
for vendor in vendors:
df1[vendor] = df1.apply(lambda row: find_price(row, vendor, df2),axis=1)
print(df1)
# PART# MY_QTY MY_PRC Vend1 Vend2 Vend3
#0 Item1 1 $20 $10 $15 $20
#1 Item1 10 $18 $8 $15 $18
#2 Item1 20 $17 $7 $12 $18
#3 Item2 1 $120 $75 $80 NaN
#4 Item2 30 $100 $55 $70 NaN
#5 Item2 50 $95 $55 $70 NaN
Upvotes: 1
Reputation: 1231
dfs = []
for val in ['PRC1','PRC2','PRC3']:
temp = pd.pivot_table(df2, index='PART#', columns='VEND#', values=val).reset_index()
dfs.append(temp)
pivot = pd.concat(dfs, axis=0)
pivot.sort_values('PART#',inplace=True)
pivot.reset_index(inplace=True)
df1.join(pivot,lsuffix='PART#')
Upvotes: 0