bakas
bakas

Reputation: 323

Convert the data frame from long to wide format and dynamically name columns

I am converting the data frame from long to wide format, however the problem I am facing is generating the right number of translated columns and dynamically renaming the new data frame columns.

So lets say I have a sample data frame as follows:

data = {'name':['Tom', 'nick', 'Tom', 'nick','Tom'], 'id':[20, 21, 20, 21,22], 'plan' : [100,101,102,101,100], 'drug' : ['a','b','b','c','a']}

df = pd.DataFrame(data)
drug id  name   plan
a    20  Tom    100
b    21  nick   101
b    20  Tom    102
c    21  nick   101
a    22  Tom    100

So for every given name and id I want to create multiple columns for plan and drugs. For example there are 3 distinct plans and 3 distinct drugs , so ideally I should get 6 new columns which indicate whether a particular plan/drug has been taken or not.

I tried converting from long to wide but I am not getting the desired result. Convert long to wide:

df1 = df.groupby(['name','id'])['plan', 'drug'].apply(lambda x: pd.DataFrame(x.values)).unstack().reset_index()

Actual output:

name    id  0   1    0   1
Tom     20  100 102  a  b
nick    21  101 101  b  c
Tom     22  100 None a  None

Expected output:

   name    age  100 101 102   a   b  c 
   Tom     20   1   0    1    1   1  0
   Tom     22   1   0    0    1   0  0
   nick    21   0   1    0    0   1  1

Upvotes: 1

Views: 877

Answers (2)

jezrael
jezrael

Reputation: 863166

Use get_dummies with max:

df1 = pd.get_dummies(df.set_index(['name','id']).astype(str)).max(level=[0,1]).reset_index()
print(df1)
   name  id  plan_100  plan_101  plan_102  drug_a  drug_b  drug_c
0   Tom  20         1         0         1       1       1       0
1  nick  21         0         1         0       0       1       1
2   Tom  22         1         0         0       1       0       0

df2 = (pd.get_dummies(df.set_index(['name','id'])
        .astype(str), prefix='', prefix_sep='')
        .max(level=[0,1])
        .reset_index())
print(df2)
   name  id  100  101  102  a  b  c
0   Tom  20    1    0    1  1  1  0
1  nick  21    0    1    0  0  1  1
2   Tom  22    1    0    0  1  0  0

EDIT: Solution with DataFrame.pivot_table, concat and DataFrame.clip:

df1 = df.pivot_table(index=['name','id'], 
                     columns=['plan'], 
                      aggfunc='size',
                      fill_value=0)

df2 = df.pivot_table(index=['name','id'], 
                     columns=['drug'], 
                      aggfunc='size',
                      fill_value=0)

df = pd.concat([df1, df2], axis=1).clip(upper=1).reset_index()
print(df)
   name  id  100  101  102  a  b  c
0   Tom  20    1    0    1  1  1  0
1   Tom  22    1    0    0  1  0  0
2  nick  21    0    1    0  0  1  1

Upvotes: 2

zero
zero

Reputation: 1745

import pandas as pd

data = {
    'name':['Tom', 'nick', 'Tom', 'nick','Tom'],
    'id':[20, 21, 20, 21,22],
    'plan': [100,101,102,101,100],
    'drug': ['a','b','b','c','a']
}

df = pd.DataFrame(data)

plans = df.groupby(['name', 'id', 'plan']).size().unstack()
drugs = df.groupby(['name', 'id', 'drug']).size().unstack()

merged_df = pd.merge(plans, drugs, left_index=True, right_index=True)

merged_df = merged_df.fillna(0)

get the plan and drug counts for each name and id. (that's what's size() and then unstack() is for)

and then just merge them on their index (which is set to name and id).

use fillna to replace NaN to 0

Upvotes: 0

Related Questions