Reut
Reut

Reputation: 1592

Remove repeated values in different columns

I have the following pandas dataframe:

>>>Feature name   error1    error2    error3   error4
0     1     A      overlaps  overlaps  overlaps overlaps
1     2     B       No error 
2     3     C       overlaps  invalid   invalid  
3     4     D     invalid   overlaps  overlaps

I would like to have for each row unique errors only, e.g:

>>>Feature Name   error1    error2    error3   error4
0     1      A    overlaps  
1     2      B    No error 
2     3      C    overlaps  invalid     
3     4      D    invalid   overlaps  

Is there any simple way to get this? I thought maybe to count the number of times each value appears per row but then i'm not sure how to remove them

Upvotes: 1

Views: 55

Answers (2)

jezrael
jezrael

Reputation: 862406

Idea is remove duplicates from error columns, add DataFrame.reindex for add possible removed columns and assign back:

cols = df.filter(like='error').columns
df[cols] = (df[cols].apply(lambda x: pd.Series(x.unique()), axis=1)
                    .reindex(np.arange(len(cols)), axis=1))
print (df)
   Feature name    error1    error2  error3  error4
0        1    A  overlaps       NaN     NaN     NaN
1        2    B        No     error     NaN     NaN
2        3    C  overlaps   invalid     NaN     NaN
3        4    D   invalid  overlaps     NaN     NaN

Upvotes: 2

BENY
BENY

Reputation: 323226

Try with

out = pd.DataFrame(list(map(pd.unique, df.loc[:,'error1':].values)),index=df.Feature)
Out[333]: 
                0         1     2
Feature                          
1        overlaps      None  None
2              No     error  None
3        overlaps   invalid  None
4         invalid  overlaps  None

Upvotes: 1

Related Questions