Reputation: 468

Split one dataframe row into multiple rows and add a column

I have the following dummy df:

columns = ['Part', 'Task_Color', 'Task', 'Solution', 'correct_one', 'incorrect_two', 'correct_three']
data = [['16a', 'Red', 'What is on the table?', 'bottle', 'a bottle', 'nothing', 'empty bottle'],
       ['16b', 'Red', 'Write hi', 'hi', 'hello', 'hey', 'no'],
       ['16c', 'Red', 'Cats or dogs', 'both', 'cats', 'dogs', 'both'],
        ['17a', 'Red', 'Nice to meet you', 'bottle', 'a bottle', 'nothing', 'empty bottle'],
        ['17b', 'Red', 'bla bla', 'nope', 'nope', 'nope', 'nope'],
        ['18a', 'Blue', 'Do x and write y', 'y', 'y', '...', 'what'],
        ['18b', 'Blue', 'Help me', 'sure', 'ok', 'yes', 'no']]
df = pd.DataFrame(columns=columns, data=data)

The df looks like this

    Part    Task_Color  Task                    Solution    correct_one   incorrect_two  correct_three
0   16a     Red         What is on the table?   bottle      a bottle      nothing       emptybottle
1   16b     Red         Write hi                hi          hello         hey           no
2   16c     Red         Cats or dogs            both        cats          dogs          both
3   17a     Red         Nice to meet you        bottle      a bottle      nothing       empty bottle
4   17b     Red         bla bla                 nope        nope          nope          nope
5   18a     Blue        Do x and write y        y           y             ...           what
6   18b     Blue        Help me                 sure        ok            yes           no

This is the dataframe I want to create out of the df above:

Part  Task_Color    Task                     Solution    answer         label
16a   Red           What is on the table?    bottle      a bottle       1
16a   Red           What is on the table?    bottle      nothing        0
16a   Red           What is on the table?    bottle      empty bottle   1
16b   Red           Write hi                 hi          hello          1
16b   Red           Write hi                 hi          hey            0
16b   Red           Write hi                 hi          no             1
16c   Red           Cats or dogs             both        cats           1
16c   Red           Cats or dogs             both        dogs           0
16c   Red           Cats or dogs             both        both           1
17a   Red           Nice to meet you         bottle      a bottle       1
17a   Red           Nice to meet you         bottle      nothing        0
17a   Red           Nice to meet you         bottle      empty bottle   1
...

I want to split every row in the original df into multiple rows so that every correct_one, incorrect_two and correct_three has its own row as illustrated above. The content that was inside these three columns should be in a new column called answer.

Moreover, I want to add a new column, label, that is filled with a 1 if the column header includes correct and elif includes incorrect a 0.

Upvotes: 0

Answers (3)

Corralien

Reputation: 120409

Another (one-line) solution with melt

>>> df.melt(df.columns[:4].tolist(), var_name="label", value_name="answer") \ 
      .replace({"label": {"correct_one": 1, "incorrect_two": 0, "correct_three": 1}}) \
      .sort_values("Part").reset_index(drop=True)

   Part Task_Color                   Task Solution  label        answer
0   16a        Red  What is on the table?   bottle      1      a bottle
1   16a        Red  What is on the table?   bottle      1  empty bottle
2   16a        Red  What is on the table?   bottle      0       nothing
3   16b        Red               Write hi       hi      1         hello
4   16b        Red               Write hi       hi      1            no
5   16b        Red               Write hi       hi      0           hey
6   16c        Red           Cats or dogs     both      1          cats
7   16c        Red           Cats or dogs     both      1          both
8   16c        Red           Cats or dogs     both      0          dogs
9   17a        Red       Nice to meet you   bottle      1  empty bottle
10  17a        Red       Nice to meet you   bottle      0       nothing
11  17a        Red       Nice to meet you   bottle      1      a bottle
12  17b        Red                bla bla     nope      0          nope
13  17b        Red                bla bla     nope      1          nope
14  17b        Red                bla bla     nope      1          nope
15  18a       Blue       Do x and write y        y      0           ...
16  18a       Blue       Do x and write y        y      1             y
17  18a       Blue       Do x and write y        y      1          what
18  18b       Blue                Help me     sure      0           yes
19  18b       Blue                Help me     sure      1            ok
20  18b       Blue                Help me     sure      1            no

Upvotes: 2

rudolfovic

Reputation: 3276

def combine(rows):
  return [
    (rows.correct_one, 1),
    (rows.incorrect_two, 0),
    (rows.correct_three, 1)]

df['answers'] = df.apply(combine, axis=1)
df = df.explode('answers')
df[['answer','correct']] = pd.DataFrame(
  df['answers'].tolist(),
  index= df.index)
df.drop('answers', axis=1)

Upvotes: 1

Haleemur Ali

Reputation: 28243

you can use stack like below to reshape the data.

output_df = df.set_index(['Part', 'Task_Color', 'Task', 'Solution']).stack().rename('answer').reset_index()

then calculate the label & drop the column level_4

df['label'] = (~df.level_4.str.contains('incorrect')).astype(int)
del df['level_4']

Upvotes: 1

Split one dataframe row into multiple rows and add a column

Answers (3)

Related Questions