Reputation:
I have a onehot-encoded columns,df with zeros as "nan". I'm trying to convert onehot encoded columns to a single column.
Assume the below dataframe, df
p1 | p2 | p3 | p4 | p5 |
---------------------------------------
0 cat nan nan nan nan
1 nan dog nan nan nan
2 nan nan horse nan nan
3 nan nan nan donkey nan
4 nan nan nan nan pig
Required Output
animals
-----------------
0 cat
1 dog
2 horse
3 donkey
4 pig
Upvotes: 7
Views: 3147
Reputation: 1055
Silly but working. Not sure what you expect if you have >1 not NA for the same index.
df['animals'] = df[df.columns[0]]
for c in df.columns[1:]:
df['animals'].fillna(df[c], inplace=True)
Upvotes: 0
Reputation: 17824
If you have one word per row you can fill NaN
with empty strings and sum by row:
df.fillna('').sum(axis=1)
Result:
0 cat
1 dog
2 horse
3 donkey
4 pig
dtype: object
Upvotes: 0
Reputation: 862641
If there is always only one non missing value per rows use forward filling missing values (like DataFrame.fillna
with method='ffill'
) and then select last column by position with DataFrame.iloc
, also for one column DataFrame
add Series.to_frame
:
df = df.ffill(axis=1).iloc[:, -1].to_frame('new')
print (df)
new
0 cat
1 dog
2 horse
3 donkey
4 pig
If possible more non missing values per rows use DataFrame.stack
with join
per first level:
print (df)
p1 p2 p3 p4 p5
0 cat NaN NaN NaN lion
1 NaN dog NaN NaN NaN
2 NaN NaN horse NaN NaN
3 NaN NaN NaN donkey NaN
4 NaN NaN NaN NaN pig
df2 = df.stack().groupby(level=0).apply(', '.join).to_frame('new')
print (df2)
new
0 cat, lion
1 dog
2 horse
3 donkey
4 pig
Or lambda function:
df2 = df.apply(lambda x: x.dropna().str.cat(sep=', '), axis=1).to_frame('new')
print (df2)
new
0 cat, lion
1 dog
2 horse
3 donkey
4 pig
Upvotes: 6