Reputation: 43
I have a pandas dataframe like this:
+-------+-----+-----------------+----------------+----------------+
| name | age | favourite_color | favourite_food | favourite_city |
+-------+-----+-----------------+----------------+----------------+
| john | 25 | yellow | apple | munich |
| james | 24 | red | onion | melbourne |
+-------+-----+-----------------+----------------+----------------+
I want to make it look like this:
+-------+-----+--------------------+-----------------------------+
| name | age | favourite category | value of favourite category |
+-------+-----+--------------------+-----------------------------+
| john | 25 | color | yellow |
| john | 25 | food | apple |
| john | 25 | city | munich |
| james | 24 | color | red |
| james | 24 | food | onion |
| james | 24 | city | melbourne |
+-------+-----+--------------------+-----------------------------+
I want to take the 3 "favorite_thing" categories and break them down from 3 columns, to 2 columns. One column should be the name of the thing and the other column should be the value of the thing.
In the process, the number of rows in the table, should get multiplied by the number of favourite_something categories (in this case 3x), and the common values like name and age should just get duplicated.
At a high level, my dataset contains 3 observations per row for each person: favourite color, food and city. I want to reshape the dataframe so that there is only a single observation in any given row.
Please tell me what is this operation called in data processing and the easiest way to do it in python and pandas.
Upvotes: 2
Views: 116
Reputation: 28644
Since there is some commonality in the columns u wish to reshape('favourite'), pandas' wide to long can be deployed here : Simply state the various arguments, and pass it into the function:
pd.wide_to_long(df,
stubnames ='favourite',
i=['name','age'],
j='favourite category',
suffix = r'(color|food|city)',
sep='_').rename(columns={'favourite':'value of favourite category'}).reset_index()
name age favourite category value of favourite category
0 john 25 color yellow
1 john 25 food apple
2 john 25 city munich
3 james 24 color red
4 james 24 food onion
5 james 24 city melbourne
Upvotes: 0
Reputation: 38415
This is essentially a reshaping problem that you can solve using DataFrame.melt
. Your dataframe has multiple value variables,
new_df = df.melt(id_vars= ['name', 'age'], value_vars=['favourite_color', 'favourite_food', 'favourite_city'], \
var_name='favourite category', value_name='value of favourite category')
new_df['favourite category'] = new_df['favourite category'].replace({'favourite_':''}, regex = True)
name age favourite category value of favourite category
0 john 25 color yellow
1 james 24 color red
2 john 25 food apple
3 james 24 food onion
4 john 25 city munich
5 james 24 city melbourne
Upvotes: 2