Reputation: 1592
I have the following pandas dataframe:
>>>Feature name error1 error2 error3 error4
0 1 A overlaps overlaps overlaps overlaps
1 2 B No error
2 3 C overlaps invalid invalid
3 4 D invalid overlaps overlaps
I would like to have for each row unique errors only, e.g:
>>>Feature Name error1 error2 error3 error4
0 1 A overlaps
1 2 B No error
2 3 C overlaps invalid
3 4 D invalid overlaps
Is there any simple way to get this? I thought maybe to count the number of times each value appears per row but then i'm not sure how to remove them
Upvotes: 1
Views: 55
Reputation: 862406
Idea is remove duplicates from error
columns, add DataFrame.reindex
for add possible removed columns and assign back:
cols = df.filter(like='error').columns
df[cols] = (df[cols].apply(lambda x: pd.Series(x.unique()), axis=1)
.reindex(np.arange(len(cols)), axis=1))
print (df)
Feature name error1 error2 error3 error4
0 1 A overlaps NaN NaN NaN
1 2 B No error NaN NaN
2 3 C overlaps invalid NaN NaN
3 4 D invalid overlaps NaN NaN
Upvotes: 2
Reputation: 323226
Try with
out = pd.DataFrame(list(map(pd.unique, df.loc[:,'error1':].values)),index=df.Feature)
Out[333]:
0 1 2
Feature
1 overlaps None None
2 No error None
3 overlaps invalid None
4 invalid overlaps None
Upvotes: 1