Reputation: 1263
I have csv like this,
Market_id Metric New_Mar_19 New_Feb_19 tot_Mar_19 tot_Feb_19
7528 1 9 7 6.5 8
7528 2 6 3.2 2.2 3
I want to convert the above csv to like this below,
Month Nrx_count Nrx_quantity Trx_count Trx_quantity
Mar_19 9 6 6.5 2.2
Feb_19 7 3.2 8 3
I have tried using pivot and pd.melt but I am not able to like above.
Upvotes: 1
Views: 57
Reputation: 863791
Idea is convert columns without _
to MultiIndex
, then split column by _
and reshape by DataFrame.stack
, then by DataFrame.unstack
:
df1 = df.set_index(['Market_id','Metric'])
df1.columns = df1.columns.str.split('_', n=1, expand=True)
df1 = df1.stack()
print (df1)
New tot
Market_id Metric
7528 1 Feb_19 7.0 8.0
Mar_19 9.0 6.5
2 Feb_19 3.2 3.0
Mar_19 6.0 2.2
#second level, here Metric is converted to columns
df1 = df1.unstack(1)
df1.columns = [f'{a}_{b}' for a, b in df1.columns]
df1 = df1.reset_index().rename(columns={'level_1':'y'})
print (df1)
Market_id y New_1 New_2 tot_1 tot_2
0 7528 Feb_19 7.0 3.2 8.0 3.0
1 7528 Mar_19 9.0 6.0 6.5 2.2
#first level, here Market_id is converted to columns
df1 = df1.unstack(0)
df1.columns = [f'{a}_{b}' for a, b in df1.columns]
df1 = df1.reset_index().rename(columns={'level_1':'y'})
print (df1)
Metric y New_7528 tot_7528
0 1 Feb_19 7.0 8.0
1 1 Mar_19 9.0 6.5
2 2 Feb_19 3.2 3.0
3 2 Mar_19 6.0 2.2
EDIT:
If get error:
ValueError: Index contains duplicate entries, cannot reshape
it means there are duplicates. Solution is same, only is necessary add aggregation - e.g. by mean
, sum
...
#change data for duplicates in column for Multiindex - Market_id, Metric
print (df)
Market_id Metric New_Mar_19 New_Feb_19 tot_Mar_19 tot_Feb_19
0 7528 1 2 5.0 5.0 8
1 7528 1 10 20.0 2.5 16
2 7528 2 6 3.2 2.2 3
df1 = df.set_index(['Market_id','Metric'])
df1.columns = df1.columns.str.split('_', n=1, expand=True)
df1 = df1.stack()
print (df1)
New tot
Market_id Metric
7528 1 Feb_19 5.0 8.0
Mar_19 2.0 5.0
Feb_19 20.0 16.0
Mar_19 10.0 2.5
2 Feb_19 3.2 3.0
Mar_19 6.0 2.2
#number of levels passes to groupby - here 3 levels -> 0,1,2
print (df1.index.nlevels)
3
#grouping by all levels and aggregate mean, sum,max...
df1 = df1.groupby(level=[0,1,2]).mean().unstack(1)
print (df1)
New tot
Metric 1 2 1 2
Market_id
7528 Feb_19 12.5 3.2 12.00 3.0
Mar_19 6.0 6.0 3.75 2.2
df1.columns = [f'{a}_{b}' for a, b in df1.columns]
df1 = df1.reset_index().rename(columns={'level_1':'y'})
print (df1)
Market_id y New_1 New_2 tot_1 tot_2
0 7528 Feb_19 12.5 3.2 12.00 3.0
1 7528 Mar_19 6.0 6.0 3.75 2.2
Upvotes: 3