Melt on multiple levels in Pandas

I have a df with part number, year and the consumed quantity per month as below where the value in 01, 02 and 03 is the quantity in Jan-Mar per year.

d = {'PN': [10506,10506,10507,10507],
 'Year': [2017, 2018, 2017, 2018],
 '01': [1,2,3,4],
 '02': [5,6,7,8],
 '03': [9,10,11,12]}
indata = pd.DataFrame(data = d)

I would like to restructure it to long format by combining Year and Month to YYYYMM format and have Part Number, YearMonth and Qty per row as below.

dd = {'PN': [10506,10506,10506,10506,10506,10506,10507,10507,10507,10507,10507,10507],
  'YearMonth': [201701,201702,201703,201801,201802,201803,201701,201702,201703,201801,201802,201803],
  'Qty': [1,5,9,2,6,10,3,7,11,4,8,12]}
outdata = pd.DataFrame(data = dd)

Since I failed using pd.melt I gave it a try using triple for loops as below.

parts = pd.Series(indata['PN']).unique()
years = pd.Series(indata['Year']).unique()
months = ['01', '02', '03']

df = pd.DataFrame(columns = ['PN', 'YearMonth', 'Qty'])

for p in parts:
    for y in years:
        for m in months:
            yearmonth = str(y*100+int(m))
            qty = indata.loc[(indata['PN'] == p) & (indata['Year'] == y), m].iloc[0]
            row = [p, yearmonth, qty]
            df = df.append(row)
outdata = df

This seems very inefficient and my append function does not add a row per loop but rather three rows in a new column.

Any suggestions?

Upvotes: 2

Views: 896

Answers (2)

jezrael
jezrael

Reputation: 862601

Use melt for reshape first, then create new column YearMonth by assign, remove unnecessary columns and last sort_values:

df = (indata.melt(id_vars=['PN','Year'], var_name='v', value_name='Qty')
            .assign(YearMonth=lambda x: x['Year'].astype(str) + x['v'])
            .drop(['v', 'Year'], axis=1)
            .sort_values(['PN','YearMonth']))

print (df)
       PN  Qty YearMonth
0   10506    1    201701
4   10506    5    201702
8   10506    9    201703
1   10506    2    201801
5   10506    6    201802
9   10506   10    201803
2   10507    3    201701
6   10507    7    201702
10  10507   11    201703
3   10507    4    201801
7   10507    8    201802
11  10507   12    201803

Upvotes: 3

BENY
BENY

Reputation: 323226

You can using melt

s=indata.melt(['Year','PN'])
s['Year']=s.Year.astype(str)+s.variable.astype(str)
s
Out[262]: 
      Year     PN variable  value
0   201701  10506       01      1
1   201801  10506       01      2
2   201701  10507       01      3
3   201801  10507       01      4
4   201702  10506       02      5
5   201802  10506       02      6
6   201702  10507       02      7
7   201802  10507       02      8
8   201703  10506       03      9
9   201803  10506       03     10
10  201703  10507       03     11
11  201803  10507       03     12

Or just stack

s=indata.set_index(['Year','PN']).stack().reset_index()
s['YearMonth']=s.Year.astype(str)+s['level_2'].astype(str)
s.rename(columns={0:'Qty'}).drop(['level_2','Year'],1)
Out[274]: 
       PN  Qty YearMonth
0   10506    1  20170101
1   10506    5  20170202
2   10506    9  20170303
3   10506    2  20180101
4   10506    6  20180202
5   10506   10  20180303
6   10507    3  20170101
7   10507    7  20170202
8   10507   11  20170303
9   10507    4  20180101
10  10507    8  20180202
11  10507   12  20180303

Upvotes: 1

Related Questions