mrgou
mrgou

Reputation: 2458

Get Dummies into Rows with pandas

Say this is my DataFrame:

+-------+-----+----------------------+--------------------------+
| Month | Day | Jane                 | Bob                      |
+=======+=====+======================+==========================+
| Mar   | 10  | ['Orange', 'Apples'] | ['Banana', 'Apples']     |
+-------+-----+----------------------+--------------------------+
| Mar   | 11  | ['Pear', 'Grape']    | ['Strawberry', 'Cherry'] |
+-------+-----+----------------------+--------------------------+

Ultimately, I want to get to this (watch for "Apple" which should appear only once in the output):

+--------+-----+------------+------+-----+
| Month  | Day | Fruit      | Jane | Bob |
+========+=====+============+======+=====+
| Mar    | 10  | Orange     | 1    | 0   |
+--------+-----+------------+------+-----+
| Mar    | 10  | Apple      | 1    | 1   |
+--------+-----+------------+------+-----+
| Mar    | 10  | Banana     | 0    | 1   |
+--------+-----+------------+------+-----+
| Mar    | 11  | Pear       | 1    | 0   |
+--------+-----+------------+------+-----+
| Mar    | 11  | Grape      | 1    | 0   |
+--------+-----+------------+------+-----+
| Mar    | 11  | Strawberry | 0    | 1   |
+--------+-----+------------+------+-----+
| Mar    | 11  | Cherry     | 0    | 1   |
+--------+-----+------------+------+-----+

So kind of trying to do a get_dummies that expands into rows. Not quite sure how to tackle this. I tried starting with explode for each column, but in my real-life case with many more columns and values, this causes the number of rows to blow out as well, with each "Fruit" being repeated for each person (unlike the above case with "Apple").

Any suggestion?

Upvotes: 1

Views: 491

Answers (2)

Shubham Sharma
Shubham Sharma

Reputation: 71689

We can stack the dataframe to reshape then explode the reshaped frame followed by groupby on Month, Day and Fruit then agg level_2 with join and encode using get_dummies

c = ['Month', 'Day']
out = df.set_index(c).stack().explode().reset_index(name='Fruit')
out.groupby([*c, 'Fruit'], sort=False)['level_2'].agg(','.join).str.get_dummies(sep=',')

                      Bob  Jane
Month Day Fruit                
Mar   10  Orange        0     1
          Apples        1     1
          Banana        1     0
      11  Pear          0     1
          Grape         0     1
          Strawberry    1     0
          Cherry        1     0

Upvotes: 2

Andrej Kesely
Andrej Kesely

Reputation: 195438

df["Fruit"] = df.loc[:, "Jane":].apply(
    lambda x: set(i for l in x for i in l), axis=1
)

df = df.explode("Fruit")

for c in df.loc[:, "Jane":"Bob"]:
    df[c] = df.apply(lambda x: int(x["Fruit"] in x[c]), axis=1)

print(df)

Prints:

  Month  Day  Jane  Bob       Fruit
0   Mar   10     1    0      Orange
0   Mar   10     1    1      Apples
0   Mar   10     0    1      Banana
1   Mar   11     0    1      Cherry
1   Mar   11     0    1  Strawberry
1   Mar   11     1    0        Pear
1   Mar   11     1    0       Grape

Upvotes: 0

Related Questions