DJNebs
DJNebs

Reputation: 43

Reshape pandas dataframe by transposing certain columns to rows

I have a pandas dataframe like this:

+-------+-----+-----------------+----------------+----------------+
| name  | age | favourite_color | favourite_food | favourite_city |
+-------+-----+-----------------+----------------+----------------+
| john  |  25 | yellow          | apple          | munich         |
| james |  24 | red             | onion          | melbourne      |
+-------+-----+-----------------+----------------+----------------+

I want to make it look like this:


+-------+-----+--------------------+-----------------------------+
| name  | age | favourite category | value of favourite category |
+-------+-----+--------------------+-----------------------------+
| john  |  25 | color              | yellow                      |
| john  |  25 | food               | apple                       |
| john  |  25 | city               | munich                      |
| james |  24 | color              | red                         |
| james |  24 | food               | onion                       |
| james |  24 | city               | melbourne                   |
+-------+-----+--------------------+-----------------------------+

I want to take the 3 "favorite_thing" categories and break them down from 3 columns, to 2 columns. One column should be the name of the thing and the other column should be the value of the thing.

In the process, the number of rows in the table, should get multiplied by the number of favourite_something categories (in this case 3x), and the common values like name and age should just get duplicated.

At a high level, my dataset contains 3 observations per row for each person: favourite color, food and city. I want to reshape the dataframe so that there is only a single observation in any given row.

Please tell me what is this operation called in data processing and the easiest way to do it in python and pandas.

Upvotes: 2

Views: 116

Answers (2)

sammywemmy
sammywemmy

Reputation: 28644

Since there is some commonality in the columns u wish to reshape('favourite'), pandas' wide to long can be deployed here : Simply state the various arguments, and pass it into the function:

  pd.wide_to_long(df,
                  stubnames ='favourite',
                  i=['name','age'],
                  j='favourite category',
                  suffix = r'(color|food|city)',
                  sep='_').rename(columns={'favourite':'value of favourite category'}).reset_index()



    name    age favourite category  value of favourite category
0   john    25       color                 yellow
1   john    25       food                  apple
2   john    25       city                  munich
3   james   24       color                 red
4   james   24       food                  onion
5   james   24       city                  melbourne

Upvotes: 0

Vaishali
Vaishali

Reputation: 38415

This is essentially a reshaping problem that you can solve using DataFrame.melt. Your dataframe has multiple value variables,

new_df = df.melt(id_vars= ['name', 'age'], value_vars=['favourite_color', 'favourite_food', 'favourite_city'], \
        var_name='favourite category', value_name='value of favourite category')
new_df['favourite category'] = new_df['favourite category'].replace({'favourite_':''}, regex = True)

    name    age favourite category  value of favourite category
0   john    25  color               yellow
1   james   24  color               red
2   john    25  food                apple
3   james   24  food                onion
4   john    25  city                munich
5   james   24  city                melbourne

Upvotes: 2

Related Questions