Pandas output difference columns

Question

given a df:

Id  Name X Y Z
111 abc  1 3 5 
111 abc  0   1
222 ijk    2
222 lmo  1 1

Grouping them by Id (since their Ids always match), I would like to output all the column names that has different values:

Id  Diff
111 X, Y, Z
222 Name, X, Y

Usually I use np.where to spot the difference but I don't think it would help in this case. Any suggestion would be appreciated. Many thanks!

Zero · Accepted Answer

Use

In [184]: df.groupby('Id').apply(lambda x: x.columns[x.nunique().ne(1)].tolist())
Out[184]:
Id
111       [X, Y, Z]
222    [Name, X, Y]
dtype: object

Also, with column name

In [210]: df.groupby('Id').apply(
              lambda x: x.columns[x.nunique().ne(1)].tolist()
            ).reset_index(name='Diff')
Out[210]:
    Id          Diff
0  111     [X, Y, Z]
1  222  [Name, X, Y]

Another simpler shorthand could be

In [213]: df.groupby('Id').nunique().ne(1).dot(df.columns)
Out[213]:
Id
111       XYZ
222    NameXY
dtype: object

Pandas output difference columns

Answers (2)

Related Questions