Reputation: 45
I`m cleaning a messy data frame where some of the information needed appear in the column names. This information should melt to a single column that would be created.
index name animal fruit veg
--------------------------------------------------
0 cow animal NaN NaN
1 apple NaN fruit NaN
2 carrot NaN NaN veg
3 dog animal NaN NaN
4 horse animal NaN NaN
5 car NaN NaN NaN
6 pear NaN fruit NaN
7 pepper NaN NaN veg
8 cucumber NaN NaN veg
9 house NaN NaN NaN
I've tried using the pandas.melt()
function, however it returns a lot of rows with "wrong" NaN
values and duplicates.
Some of the rows are supposed to show NaN
, but only the ones who don't fit into the categories specified in the column names, so I can't use the pandas.dropna()
.
Also I can't be sure that removing the duplicates wouldn't remove important data.
This is the code I used:
import pandas as pd
pd.melt(df, id_vars=['index', 'name'],
value_vars=['animal', 'fruit', 'veg'],
var_name='type')
The result I need should look something like this:
index name type
--------------------------------------------------
0 cow animal
1 apple fruit
2 carrot veg
3 dog animal
4 horse animal
5 car NaN
6 pear fruit
7 pepper veg
8 cucumber veg
9 house NaN
Upvotes: 4
Views: 2192
Reputation: 75090
You can just do (assuming the index is not a column , instead an index) , using df.ffill()
on axis=1
:
df['type']=df[df.columns[1:]].ffill(axis=1).iloc[:,-1]
#alternatively-> df['type']=df.loc[:,['animal','fruit','veg']].ffill(axis=1).iloc[:,-1]
df_new=df[['name','type']]
print(df_new)
name type
index
0 cow animal
1 apple fruit
2 carrot veg
3 dog animal
4 horse animal
5 car NaN
6 pear fruit
7 pepper veg
8 cucumber veg
9 house NaN
Upvotes: 2