Reputation: 720
I have a dataframe of the form:
basket fruit_apple fruit_pear fruit_cherry
basket_id_1 5 NaN 6
basket_id_2 NaN 1 NaN
and I want to create two new columns that should look like this:
basket fruit_type number
basket_id_1 apple 5
basket_id_1 pear NaN
basket_id_1 cherry 6
basket_id_2 apple NaN
basket_id_2 pear 1
basket_id_2 cherry NaN
where the contents of fruit_type are determined automatically with a colname.split('_')[1]
How can this be done in a concise way?
Upvotes: 2
Views: 108
Reputation: 28729
Transform from wide to long format,
create new columns by stripping 'fruit_' from fruit_type column
and sorting by 'basket' column
Edit: thanks to @jezrael, who pointed out the dangers of stripping (it will take out any of the alphabets, which will definitely affect scalability to other use cases), changed it to replace, which should be safer, since it works on regex and grouping.
(df.melt(id_vars='basket',
var_name='fruit_type',
value_name='number')
.assign(fruit_type=lambda x: x.fruit_type.str.replace(r'fruit_',''))
.sort_values('basket')
)
basket fruit_type number
0 basket_id_1 apple 5.0
2 basket_id_1 pear NaN
4 basket_id_1 cherry 6.0
1 basket_id_2 apple NaN
3 basket_id_2 pear 1.0
5 basket_id_2 cherry NaN
Upvotes: 2
Reputation: 863741
I think faster is change column names insted new column (because less number of values), so I suggest use rename
in first step and reshape with DataFrame.melt
, last change order by DataFrame.sort_values
:
df = (df.rename(columns= lambda x: x.split('_')[-1])
.melt('basket', var_name='fruit_type', value_name='number')
.sort_values('basket'))
print (df)
basket fruit_type number
0 basket_id_1 apple 5.0
2 basket_id_1 pear NaN
4 basket_id_1 cherry 6.0
1 basket_id_2 apple NaN
3 basket_id_2 pear 1.0
5 basket_id_2 cherry NaN
Another idea is use DataFrame.set_index
with DataFrame.stack
:
df = (df.set_index('basket')
.rename(columns= lambda x: x.split('_')[-1])
.rename_axis('fruit_type', axis=1)
.stack(dropna=False)
.reset_index(name='number')
)
print (df)
basket fruit_type number
0 basket_id_1 apple 5.0
1 basket_id_1 pear NaN
2 basket_id_1 cherry 6.0
3 basket_id_2 apple NaN
4 basket_id_2 pear 1.0
5 basket_id_2 cherry NaN
Upvotes: 7