Input Table index income Education age1to_20 pcd 1 income_1 Education_0 1 A5009 2 income_2 Education_2 1 A3450 3 income_1 Education_0 1 A5009 4 income_3 Education_1 0 A3450 How do I convert this table into index income_1 income_2 INCOME_3 Education_0 Education_1 Education_2 age1to_20 1 A5009 0 0 A5009 0 0 A5009 2 0 A3450 0 0 0 A3450 A3450 3 A5009 0 0 A5009 0 0 A5009 4 0 0 A3450 0 A3450 0 0 UPDATED THE OUTPUT TABLE

Reputation: 45

How to pivot multiple columns

Input Table

index	income	Education	age1to_20	pcd
1	income_1	Education_0	1	A5009
2	income_2	Education_2	1	A3450
3	income_1	Education_0	1	A5009
4	income_3	Education_1	0	A3450

How do I convert this table into

index	income_1	income_2	INCOME_3	Education_0	Education_1	Education_2	age1to_20
1	A5009	0	0	A5009	0	0	A5009
2	0	A3450	0	0	0	A3450	A3450
3	A5009	0	0	A5009	0	0	A5009
4	0	0	A3450	0	A3450	0	0

UPDATED THE OUTPUT TABLE

Upvotes: 1

Answers (4)

Gonçalo Peres

Reputation: 13582

pandas.get_dummies allows us to get almost the desired output, but instead of getting the values as the pcd column, one has 0 or 1 for a given pcd

import pandas as pd

df2 = pd.get_dummies(df, columns=['income', 'Education'])

[Out]:

   index  age1to_20  ... Education_Education_1  Education_Education_2
0      1          1  ...                     0                      0
1      2          1  ...                     0                      1
2      3          1  ...                     0                      0
3      4          0  ...                     1                      0

Then, in order to change the 1s with the respective pcd value (and drop the column pcd at the end), one can use pandas.Series.mask and pandas.DataFrame.drop as follows

cols = df2.columns.drop(['index', 'pcd']) # To select the columns one wants to apply the change
df2[cols] = df2[cols].mask(df2[cols] == 1, df2['pcd'], axis=0)
df2 = df2.drop(['pcd'], axis=1) 

[Out]:

   index age1to_20  ... Education_Education_1 Education_Education_2
0      1     A5009  ...                     0                     0
1      2     A3450  ...                     0                 A3450
2      3     A5009  ...                     0                     0
3      4         0  ...                 A3450                     0

Notes:

An alternative to change the 1s to the respective pcd and drop the column pcd would be to use pandas.DataFrame.assign, numpy.where and pandas.DataFrame.drop as follows

import numpy as np

df2 = df2.assign(income_1 = np.where(df2['income_income_1'] == 1, df2['pcd'], 0),
              income_2 = np.where(df2['income_income_2'] == 1, df2['pcd'], 0),
              income_3 = np.where(df2['income_income_3'] == 1, df2['pcd'], 0),
              Education_0 = np.where(df2['Education_Education_0'] == 1, df2['pcd'], 0),
              Education_1 = np.where(df2['Education_Education_1'] == 1, df2['pcd'], 0),
              Education_2 = np.where(df2['Education_Education_2'] == 1, df2['pcd'], 0),
              Age1to20 = np.where(df2['age1to_20'] == 1, df2['pcd'], 0)).drop(['age1to_20', 'income_income_1', 'income_income_2', 'income_income_3', 'Education_Education_0', 'Education_Education_1', 'Education_Education_2', 'pcd'], axis=1)

[Out]:

   index income_1 income_2  ... Education_1 Education_2 Age1to20
0      1    A5009        0  ...           0           0    A5009
1      2        0    A3450  ...           0       A3450    A3450
2      3    A5009        0  ...           0           0    A5009
3      4        0        0  ...       A3450           0        0

There are additional ways to select the columns one wants to consider in a given dataframe. For that this might be useful: Selecting multiple columns in a Pandas dataframe

Upvotes: 0

mozway

Reputation: 260380

you need to melt and pivot:

cols_to_pivot = ['income', 'Education']

cols = df.columns.difference(cols_to_pivot)
(df.melt(cols).drop(columns='variable')
   .pivot(index=['index', 'age1to_20'], columns='value', values='pcd')
   .fillna(0).reset_index()
)

output:

value  index  age1to_20 Education_0 Education_1 Education_2 income_1 income_2 income_3
0          1          1       A5009           0           0    A5009        0        0
1          2          1           0           0       A3450        0    A3450        0
2          3          1       A5009           0           0    A5009        0        0
3          4          0           0       A3450           0        0        0    A3450

reworking column age1to_20

cols_to_pivot = ['income', 'Education']

cols = df.columns.difference(cols_to_pivot)
(df.assign(age1to_20=df['pcd'].where(df['age1to_20'].eq(1), 0))
   .melt(cols).drop(columns='variable')
   .pivot(index=['index', 'age1to_20'], columns='value', values='pcd')
   .fillna(0).reset_index()
)

output:

value  index age1to_20 Education_0 Education_1 Education_2 income_1 income_2 income_3
0          1     A5009       A5009           0           0    A5009        0        0
1          2     A3450           0           0       A3450        0    A3450        0
2          3     A5009       A5009           0           0    A5009        0        0
3          4         0           0       A3450           0        0        0    A3450

Upvotes: 2

sammywemmy

Reputation: 28644

One option is to pivot once, run a for loop across the dataframe to get the individual dataframes, and concatenate across the columns, to get the final output:

out = df.pivot(index=['index', 'age1to_20'], 
               columns=['income', 'Education'], 
               values = 'pcd')

out = [out.droplevel(axis=1,level=n) 
       for n in range(out.columns.nlevels)]

pd.concat(out, axis = 1).fillna(0, downcast='infer').reset_index()

   index  age1to_20 Education_0 Education_2 Education_1 income_1 income_2 income_3
0      1          1       A5009           0           0    A5009        0        0
1      2          1           0       A3450           0        0    A3450        0
2      3          1       A5009           0           0    A5009        0        0
3      4          0           0           0       A3450        0        0    A3450

Upvotes: 1

PaulS

Reputation: 25323

Another possible solution:

(pd.concat([
    df.pivot(index=['index', 'age1to_20'], columns=['income'], values='pcd'),
    df.pivot(index=['index', 'age1to_20'], columns=['Education'], values='pcd')], axis=1)
 .fillna(0).reset_index())

Output:

   index  age1to_20 income_1 income_2 income_3 Education_0 Education_1 Education_2
0      1          1    A5009        0        0       A5009           0           0
1      2          1        0    A3450        0           0           0       A3450
2      3          1    A5009        0        0       A5009           0           0
3      4          0        0        0    A3450           0       A3450           0

EDIT

In case there are a lot of columns to pivot, the following code does that, by iterating over the list of columns to pivot with map:

cols = ['income', 'Education']

(pd.concat(
    map(lambda x: df.pivot(
        index=['index', 'age1to_20'], columns=x, values='pcd'), cols), axis=1)
 .fillna(0).reset_index())

Upvotes: 4

How to pivot multiple columns

Answers (4)

reworking column age1to_20

Related Questions