Reputation: 979
I have dataframe like this:
Date ID Age Gender Fruits
1.1.19 1 50 F Apple
2.1.19 1 50 F Mango
2.1.19 1 50 F Orange
1.1.19 2 75 M Grapes
4.1.19 3 20 M Apple
4.1.19 3 20 M Grapes
for example i have two lists:
fruits_list = ['Apple', 'Mango', 'Orange', 'Grapes', 'Banana', 'Guava']
date_list = ['1.1.19', '2.1.19', '3.1.19', '4.1.19', '5.1.19', '6.1.19']
I want to convert the Fruit column into further columns which gives binary info yes/no for each person. And the missing date should be NaN. by using this
pd.get_dummies(df, columns=['Fruits'], prefix='', prefix_sep='').groupby('Date').max()
i will get this but i need all the elements which are in fruit_list and date_list
Date ID Age Gender Apple Mango Orange Grapes
1.1.19 1 50 F 1 0 0 0
1.1.19 2 75 M 0 0 0 1
2.1.19 1 50 F 0 1 1 0
3.1.19 NaN NaN NaN NaN NaN NaN NaN
4.1.19 3 20 M 1 0 0 1
Desired output would be like this.
Date ID Age Gender Apple Mango Orange Grapes Banana Guava sum
1.1.19 1 50 F 1 0 0 0 0 0 1
1.1.19 2 75 M 0 0 0 1 0 0 1
2.1.19 1 50 F 0 1 1 0 0 0 2
3.1.19 NaN NaN NaN 0 0 0 0 0 0 0
4.1.19 3 20 M 1 0 0 1 0 0 2
5.1.19 NaN NaN NaN 0 0 0 0 0 0 0
6.1.19 NaN NaN NaN 0 0 0 0 0 0 0
Upvotes: 1
Views: 53
Reputation: 863166
First is necessary datetimes in date_list
and Date
column for avoid missing values, then use DataFrame.reindex
for add not exist Datetimes and columns:
df1 = (pd.get_dummies(df, columns=['Fruits'], prefix='', prefix_sep='')
.groupby('Date')
.max()
.reindex(index=date_list, columns=fruits_list, fill_value=0)
.assign(sum = lambda x: x.sum(axis=1)))
print (df1)
Apple Mango Orange Grapes Banana Guava sum
Date
1.1.19 1 0 0 1 0 0 2
2.1.19 0 1 1 0 0 0 2
3.1.19 0 0 0 0 0 0 0
4.1.19 1 0 0 1 0 0 2
5.1.19 0 0 0 0 0 0 0
6.1.19 0 0 0 0 0 0 0
Then add original columns by DataFrame.merge
and change order of columns by reindex
:
cols = ['Date','ID','Age','Gender'] + fruits_list
df = df1.reset_index().merge(df, on='Date', how='left').reindex(cols, axis=1)
print (df)
Date ID Age Gender Apple Mango Orange Grapes Banana Guava
0 1.1.19 1.0 50.0 F 1 0 0 1 0 0
1 1.1.19 2.0 75.0 M 1 0 0 1 0 0
2 2.1.19 1.0 50.0 F 0 1 1 0 0 0
3 2.1.19 1.0 50.0 F 0 1 1 0 0 0
4 3.1.19 NaN NaN NaN 0 0 0 0 0 0
5 4.1.19 3.0 20.0 M 1 0 0 1 0 0
6 4.1.19 3.0 20.0 M 1 0 0 1 0 0
7 5.1.19 NaN NaN NaN 0 0 0 0 0 0
8 6.1.19 NaN NaN NaN 0 0 0 0 0 0
Upvotes: 3