Dread
Dread

Reputation: 861

compare multiple columns in a pandas dataframe

I want to compare multiple columns of a dataframe and find the rows where a value is different. In the example I only have three columns but I want to be able to reuse this process regardless of the number of columns or the column names.

Code to create dataframe:

dict = {'column_name': {0: 'a_desc',
1: 'a_code',
2: 'b_desc',
3: 'b_code',
4: 'fiscal_year'},
'20190508-131122': {0: 250.0, 1: 50.0, 2: 100.0, 3: 250.0, 4: 20.0},
'20190508-151756': {0: 250.0, 1: 51.0, 2: 100.0, 3: 250.0, 4: 20.0},
'20190509-074713': {0: 250.0, 1: 50.0, 2: 100.0, 3: 250.0, 4: 20.0}}

df = pd.DataFrame(dict)

df = df.set_index('column_name')

Dataframe

column_name   20190508-131122   20190508-151756   20190509-074713
a_desc        250.0             250.0             250.0
a_code        50.0              51.0              50.0
b_desc        100.0             100.0             100.0
b_code        250.0             250.0             250.0
fiscal_year   20.0              20.0              20.0

Here's my latest attempt but it isn't working (there are no matches):

matched = ~df.all(axis=1)
df.loc[matched]

My expectation is that it would print the contents of row 'a_code' since one of the columns has a different value from the others.

I know there are a lot of questions posted about comparing multiple columns but I coudn't find this situation or code that I could repurpose to this scenario.

Upvotes: 2

Views: 1206

Answers (2)

BENY
BENY

Reputation: 323226

IIUC nunique which will return the unique number per row , in your case you need to select the number more than 1 , so at least have one value different from other per row

df[df.nunique(1).ne(1)]
Out[331]: 
             20190508-131122  20190508-151756  20190509-074713
column_name                                                   
a_code                  50.0             51.0             50.0

Upvotes: 3

jezrael
jezrael

Reputation: 862406

Use DataFrame.ne or DataFrame.eq for comparing with DataFrame.any or DataFrame.all for test at least one True or all Trues per rows, DataFrame.iloc is used for select first column:

matched = df.ne(df.iloc[:, 0], axis=0).any(axis=1)
df = df.loc[matched]
print (df)
             20190508-131122  20190508-151756  20190509-074713
column_name                                                   
a_code                  50.0             51.0             50.0

Or:

matched = df.eq(df.iloc[:, 0], axis=0).all(axis=1)
df = df.loc[~matched]

Upvotes: 2

Related Questions