s_khan92
s_khan92

Reputation: 979

How to generate columns from a list of column in dataframe?

I have dataframe like this:

  Date       ID     Age      Gender        Fruits
  1.1.19     1      50         F            Apple
  2.1.19     1      50         F            Mango
  2.1.19     1      50         F            Orange
  1.1.19     2      75         M            Grapes
  4.1.19     3      20         M            Apple
  4.1.19     3      20         M            Grapes

for example i have two lists:

fruits_list = ['Apple', 'Mango', 'Orange', 'Grapes', 'Banana', 'Guava']
date_list = ['1.1.19', '2.1.19', '3.1.19', '4.1.19', '5.1.19', '6.1.19']

I want to convert the Fruit column into further columns which gives binary info yes/no for each person. And the missing date should be NaN. by using this

pd.get_dummies(df, columns=['Fruits'], prefix='', prefix_sep='').groupby('Date').max()

i will get this but i need all the elements which are in fruit_list and date_list

Date          ID     Age      Gender        Apple    Mango      Orange    Grapes 
1.1.19         1      50         F            1        0           0         0
1.1.19         2      75         M            0        0           0         1
2.1.19         1      50         F            0        1           1         0 
3.1.19         NaN    NaN        NaN          NaN     NaN        NaN       NaN
4.1.19         3      20         M            1        0           0         1

Desired output would be like this.

Date             ID     Age      Gender        Apple    Mango      Orange    Grapes    Banana   Guava   sum
    1.1.19         1      50         F            1        0           0         0     0        0       1 
    1.1.19         2      75         M            0        0           0         1     0        0       1
    2.1.19         1      50         F            0        1           1         0     0        0       2
    3.1.19         NaN    NaN        NaN          0        0           0         0     0        0       0
    4.1.19         3      20         M            1        0           0         1     0        0       2
    5.1.19         NaN    NaN        NaN          0        0           0         0     0        0       0
    6.1.19         NaN    NaN        NaN          0        0           0         0     0        0        0

Upvotes: 1

Views: 53

Answers (1)

jezrael
jezrael

Reputation: 863166

First is necessary datetimes in date_list and Date column for avoid missing values, then use DataFrame.reindex for add not exist Datetimes and columns:

df1 = (pd.get_dummies(df, columns=['Fruits'], prefix='', prefix_sep='')
         .groupby('Date')
         .max()
         .reindex(index=date_list, columns=fruits_list, fill_value=0)
         .assign(sum = lambda x: x.sum(axis=1)))
print (df1)

        Apple  Mango  Orange  Grapes  Banana  Guava  sum
Date                                                    
1.1.19      1      0       0       1       0      0    2
2.1.19      0      1       1       0       0      0    2
3.1.19      0      0       0       0       0      0    0
4.1.19      1      0       0       1       0      0    2
5.1.19      0      0       0       0       0      0    0
6.1.19      0      0       0       0       0      0    0

Then add original columns by DataFrame.merge and change order of columns by reindex:

cols = ['Date','ID','Age','Gender'] + fruits_list
df = df1.reset_index().merge(df, on='Date', how='left').reindex(cols, axis=1)
print (df)
     Date   ID   Age Gender  Apple  Mango  Orange  Grapes  Banana  Guava
0  1.1.19  1.0  50.0      F      1      0       0       1       0      0
1  1.1.19  2.0  75.0      M      1      0       0       1       0      0
2  2.1.19  1.0  50.0      F      0      1       1       0       0      0
3  2.1.19  1.0  50.0      F      0      1       1       0       0      0
4  3.1.19  NaN   NaN    NaN      0      0       0       0       0      0
5  4.1.19  3.0  20.0      M      1      0       0       1       0      0
6  4.1.19  3.0  20.0      M      1      0       0       1       0      0
7  5.1.19  NaN   NaN    NaN      0      0       0       0       0      0
8  6.1.19  NaN   NaN    NaN      0      0       0       0       0      0

Upvotes: 3

Related Questions