Melt on multiple levels in Pandas

Question

I have a df with part number, year and the consumed quantity per month as below where the value in 01, 02 and 03 is the quantity in Jan-Mar per year.

d = {'PN': [10506,10506,10507,10507],
 'Year': [2017, 2018, 2017, 2018],
 '01': [1,2,3,4],
 '02': [5,6,7,8],
 '03': [9,10,11,12]}
indata = pd.DataFrame(data = d)

I would like to restructure it to long format by combining Year and Month to YYYYMM format and have Part Number, YearMonth and Qty per row as below.

dd = {'PN': [10506,10506,10506,10506,10506,10506,10507,10507,10507,10507,10507,10507],
  'YearMonth': [201701,201702,201703,201801,201802,201803,201701,201702,201703,201801,201802,201803],
  'Qty': [1,5,9,2,6,10,3,7,11,4,8,12]}
outdata = pd.DataFrame(data = dd)

Since I failed using pd.melt I gave it a try using triple for loops as below.

parts = pd.Series(indata['PN']).unique()
years = pd.Series(indata['Year']).unique()
months = ['01', '02', '03']

df = pd.DataFrame(columns = ['PN', 'YearMonth', 'Qty'])

for p in parts:
    for y in years:
        for m in months:
            yearmonth = str(y*100+int(m))
            qty = indata.loc[(indata['PN'] == p) & (indata['Year'] == y), m].iloc[0]
            row = [p, yearmonth, qty]
            df = df.append(row)
outdata = df

This seems very inefficient and my append function does not add a row per loop but rather three rows in a new column.

Any suggestions?

jezrael · Accepted Answer

Use melt for reshape first, then create new column YearMonth by assign, remove unnecessary columns and last sort_values:

df = (indata.melt(id_vars=['PN','Year'], var_name='v', value_name='Qty')
            .assign(YearMonth=lambda x: x['Year'].astype(str) + x['v'])
            .drop(['v', 'Year'], axis=1)
            .sort_values(['PN','YearMonth']))

print (df)
       PN  Qty YearMonth
0   10506    1    201701
4   10506    5    201702
8   10506    9    201703
1   10506    2    201801
5   10506    6    201802
9   10506   10    201803
2   10507    3    201701
6   10507    7    201702
10  10507   11    201703
3   10507    4    201801
7   10507    8    201802
11  10507   12    201803

Melt on multiple levels in Pandas

Answers (2)

Related Questions