Reputation: 323
I am converting the data frame from long to wide format, however the problem I am facing is generating the right number of translated columns and dynamically renaming the new data frame columns.
So lets say I have a sample data frame as follows:
data = {'name':['Tom', 'nick', 'Tom', 'nick','Tom'], 'id':[20, 21, 20, 21,22], 'plan' : [100,101,102,101,100], 'drug' : ['a','b','b','c','a']}
df = pd.DataFrame(data)
drug id name plan
a 20 Tom 100
b 21 nick 101
b 20 Tom 102
c 21 nick 101
a 22 Tom 100
So for every given name and id I want to create multiple columns for plan and drugs. For example there are 3 distinct plans and 3 distinct drugs , so ideally I should get 6 new columns which indicate whether a particular plan/drug has been taken or not.
I tried converting from long to wide but I am not getting the desired result. Convert long to wide:
df1 = df.groupby(['name','id'])['plan', 'drug'].apply(lambda x: pd.DataFrame(x.values)).unstack().reset_index()
Actual output:
name id 0 1 0 1
Tom 20 100 102 a b
nick 21 101 101 b c
Tom 22 100 None a None
Expected output:
name age 100 101 102 a b c
Tom 20 1 0 1 1 1 0
Tom 22 1 0 0 1 0 0
nick 21 0 1 0 0 1 1
Upvotes: 1
Views: 877
Reputation: 863166
Use get_dummies
with max
:
df1 = pd.get_dummies(df.set_index(['name','id']).astype(str)).max(level=[0,1]).reset_index()
print(df1)
name id plan_100 plan_101 plan_102 drug_a drug_b drug_c
0 Tom 20 1 0 1 1 1 0
1 nick 21 0 1 0 0 1 1
2 Tom 22 1 0 0 1 0 0
df2 = (pd.get_dummies(df.set_index(['name','id'])
.astype(str), prefix='', prefix_sep='')
.max(level=[0,1])
.reset_index())
print(df2)
name id 100 101 102 a b c
0 Tom 20 1 0 1 1 1 0
1 nick 21 0 1 0 0 1 1
2 Tom 22 1 0 0 1 0 0
EDIT: Solution with DataFrame.pivot_table
, concat
and DataFrame.clip
:
df1 = df.pivot_table(index=['name','id'],
columns=['plan'],
aggfunc='size',
fill_value=0)
df2 = df.pivot_table(index=['name','id'],
columns=['drug'],
aggfunc='size',
fill_value=0)
df = pd.concat([df1, df2], axis=1).clip(upper=1).reset_index()
print(df)
name id 100 101 102 a b c
0 Tom 20 1 0 1 1 1 0
1 Tom 22 1 0 0 1 0 0
2 nick 21 0 1 0 0 1 1
Upvotes: 2
Reputation: 1745
import pandas as pd
data = {
'name':['Tom', 'nick', 'Tom', 'nick','Tom'],
'id':[20, 21, 20, 21,22],
'plan': [100,101,102,101,100],
'drug': ['a','b','b','c','a']
}
df = pd.DataFrame(data)
plans = df.groupby(['name', 'id', 'plan']).size().unstack()
drugs = df.groupby(['name', 'id', 'drug']).size().unstack()
merged_df = pd.merge(plans, drugs, left_index=True, right_index=True)
merged_df = merged_df.fillna(0)
get the plan
and drug
counts for each name
and id
. (that's what's size()
and then unstack()
is for)
and then just merge them on their index (which is set to name
and id
).
use fillna
to replace NaN
to 0
Upvotes: 0