C. Hale
C. Hale

Reputation: 145

Pandas merge, scale, and pivot long-form and wide-form dataframes

I have two Pandas dataframes that I need to merge. The first is a long-form data set containing my selling prices for items at various quantity breaks. Prices decrease as the quantity of parts being purchased increases.

Dataframe1

PART#    MY_QTY   MY_PRC
Item1    1        $20
Item1    10       $18
Item1    20       $17
Item2    1        $120
Item2    30       $100
Item2    50       $95

The second is a wide-form data set containing multiple vendor's quantity breaks and selling prices. For Item1 below if I buy 1 piece from Vend1 I pay $10, 4 pcs is still $10, 5pcs is $8, etc. The number of quantity breaks varies by item and by vendor, and not all vendors sell all items.

Dataframe2

PART#    VEND#   QTY1  PRC1   QTY2   PRC2   QTY3   PRC3
Item1    Vend1    1    $10     5     $8      15    $7
Item1    Vend2    1    $15     11    $12     30    $11
Item1    Vend3    1    $20     10    $18
Item2    Vend1    1    $75     20    $60     30    $55
Item2    Vend2    1    $80     12    $70

I want to merge the dataframes so that I can compare my selling price at each of my quantity breaks with the vendor costs at the same quantities. The final dataframe will have the shape of a left merge on PART# with VEND#'s pivoted to columns.

The part I'm having difficulty with is grabbing the correct vendor price based on MY_QTY. I should be able to read across a row and see what all parties are charging for an item at the given quantity. Expected output below.

Result Dataframe

PART#    MY_QTY   MY_PRC    VEND1    VEND2    VEND3
Item1    1        $20       $10      $15      $20
Item1    10       $18       $8       $15      $18
Item1    20       $17       $7       $12      $18
Item2    1        $120      $75      $80
Item2    30       $100      $55      $70
Item2    50       $95       $55      $70

EDIT

People seem to be getting confused with Dataframe2. This dataframe is read row-wise. The first row of values shows prices for Item1 sold by Vend1. For this row from QTY1 (1 piece) up to QTY2 (5 pieces) the price is PRC1 ($10), then from QTY2 (5 pieces) up to QTY3 (15 pieces) the price is PRC2 ($8). The price stays the same until the quantity requested hits the next quantity break.

Say Mama's Farm Stand is selling apples for $1 each. If you buy 5 apples then the price per apple drops to $.75. If you buy 15 apples then the price drops again to $.50. The dataframe for this example would look like below.

PART#    VEND#   QTY1  PRC1   QTY2   PRC2   QTY3   PRC3
Apple    Mama    1     $1     5      $.75   15     $.5

Upvotes: 1

Views: 361

Answers (3)

Keith
Keith

Reputation: 645

Here's another way that only uses loops over the vendor, but requires the data to be sorted

import pandas as pd
from io import StringIO
import numpy as np

df1_t = StringIO("""PART#,MY_QTY,MY_PRC
Item1,1,$20
Item1,10,$18
Item1,20,$17
Item2,1,$120
Item2,30,$100
Item2,50,$95
""")

df2_t = StringIO("""PART#,VEND#,QTY1,PRC1,QTY2,PRC2,QTY3,PRC3
Item1,Vend1,1,$10,5,$8,15,$7
Item1,Vend2,1,$15,11,$12,30,$11
Item1,Vend3,1,$20,10,$18
Item2,Vend1,1,$75,20,$60,30,$55
Item2,Vend2,1,$80,12,$70
""")

df1 = pd.read_csv(df1_t)
df2 = pd.read_csv(df2_t)


df2 = pd.wide_to_long(df2, ['QTY','PRC'], i='VEND#', j='v').set_index('QTY',append=True).reset_index().drop('v',
    axis=1)
df1['MY_QTY'] = df1['MY_QTY'].astype(float)
df1 = df1.sort_values(by="MY_QTY")
df2 = df2.sort_values(by="QTY")
df2 = df2.dropna(axis=0, how='any')

vendors = df2['VEND#'].unique()
df3=df1
for vendor in vendors:
    df3 = pd.merge_asof(df3, df2[df2['VEND#']==vendor], left_on="MY_QTY", right_on="QTY", by='PART#',suffixes=('', '_y'))

to_drop = [x for x in df3 if x.startswith('VEND')]
to_drop = to_drop + [x for x in df3 if x.startswith('QTY')]
df3.drop(to_drop, axis=1, inplace=True)
df3 = df3.rename(columns={prc : vendor for prc, vendor in zip([x for x in df3 if x.startswith('PRC')], vendors)})

print(df3)
#     PART#  MY_QTY MY_PRC Vend1 Vend3 Vend3
#0  Item1     1.0    $20   $10   $15   $20
#1  Item2     1.0   $120   $75   $80   NaN
#2  Item1    10.0    $18    $8   $15   $18
#3  Item1    20.0    $17    $7   $12   $18
#4  Item2    30.0   $100   $55   $70   NaN
#5  Item2    50.0    $95   $55   $70   NaN

Upvotes: 1

Keith
Keith

Reputation: 645

Here's a working example of how you might do it. This is by no means efficient. Other people seemed to be trying to join the two datasets, but it sounds like what you wanted was actually getting the price of the the largest QTY <= MY_QTY for each vendor / part combination.

import pandas as pd
from io import StringIO
import numpy as np

df1_t = StringIO("""PART#,MY_QTY,MY_PRC
Item1,1,$20
Item1,10,$18
Item1,20,$17
Item2,1,$120
Item2,30,$100
Item2,50,$95
""")

df2_t = StringIO("""PART#,VEND#,QTY1,PRC1,QTY2,PRC2,QTY3,PRC3
Item1,Vend1,1,$10,5,$8,15,$7
Item1,Vend2,1,$15,11,$12,30,$11
Item1,Vend3,1,$20,10,$18
Item2,Vend1,1,$75,20,$60,30,$55
Item2,Vend2,1,$80,12,$70
""")

df1 = pd.read_csv(df1_t)
df2 = pd.read_csv(df2_t)

vendors = df2['VEND#'].unique()
items = df2['PART#'].unique()

# for the specific item and vendor in the rows of Dataframe1 (df1), find the 
# largest QTY for that that's less than MY_QTY for the same combination of item
# and vendor in df2
def find_price(row, vendor, df2):
    item = row['PART#']
    quantity = row['MY_QTY']
    # get the row with that specific item / vendor combo
    prices = df2[(df2['PART#']==item) & (df2['VEND#']==vendor)]
    # reshape a little
    prices = pd.wide_to_long(prices, ['QTY','PRC'], i='VEND#', j='v').set_index('QTY',append=True).reset_index().drop('v',axis=1)
    # only get where QTY <= MY_QTY
    prices = prices[prices['QTY']<=quantity]
    if prices.empty:
        return np.nan
    else:
        return prices.loc[prices['QTY'].argmax(),:]['PRC']


# iterate throw the vendors, and use find_price to get the corresponding price
for vendor in vendors:
    df1[vendor] = df1.apply(lambda row: find_price(row, vendor, df2),axis=1)

print(df1)
#   PART#  MY_QTY MY_PRC Vend1 Vend2 Vend3
#0  Item1       1    $20   $10   $15   $20
#1  Item1      10    $18    $8   $15   $18
#2  Item1      20    $17    $7   $12   $18
#3  Item2       1   $120   $75   $80   NaN
#4  Item2      30   $100   $55   $70   NaN
#5  Item2      50    $95   $55   $70   NaN

Upvotes: 1

Yale Newman
Yale Newman

Reputation: 1231

dfs = []
for val in ['PRC1','PRC2','PRC3']:    
    temp = pd.pivot_table(df2, index='PART#', columns='VEND#', values=val).reset_index()
    dfs.append(temp)
pivot = pd.concat(dfs, axis=0)
pivot.sort_values('PART#',inplace=True)
pivot.reset_index(inplace=True)
df1.join(pivot,lsuffix='PART#')

Upvotes: 0

Related Questions