Reputation: 861
I want to compare multiple columns of a dataframe and find the rows where a value is different. In the example I only have three columns but I want to be able to reuse this process regardless of the number of columns or the column names.
Code to create dataframe:
dict = {'column_name': {0: 'a_desc',
1: 'a_code',
2: 'b_desc',
3: 'b_code',
4: 'fiscal_year'},
'20190508-131122': {0: 250.0, 1: 50.0, 2: 100.0, 3: 250.0, 4: 20.0},
'20190508-151756': {0: 250.0, 1: 51.0, 2: 100.0, 3: 250.0, 4: 20.0},
'20190509-074713': {0: 250.0, 1: 50.0, 2: 100.0, 3: 250.0, 4: 20.0}}
df = pd.DataFrame(dict)
df = df.set_index('column_name')
Dataframe
column_name 20190508-131122 20190508-151756 20190509-074713
a_desc 250.0 250.0 250.0
a_code 50.0 51.0 50.0
b_desc 100.0 100.0 100.0
b_code 250.0 250.0 250.0
fiscal_year 20.0 20.0 20.0
Here's my latest attempt but it isn't working (there are no matches):
matched = ~df.all(axis=1)
df.loc[matched]
My expectation is that it would print the contents of row 'a_code' since one of the columns has a different value from the others.
I know there are a lot of questions posted about comparing multiple columns but I coudn't find this situation or code that I could repurpose to this scenario.
Upvotes: 2
Views: 1206
Reputation: 323226
IIUC nunique
which will return the unique number per row , in your case you need to select the number more than 1 , so at least have one value different from other per row
df[df.nunique(1).ne(1)]
Out[331]:
20190508-131122 20190508-151756 20190509-074713
column_name
a_code 50.0 51.0 50.0
Upvotes: 3
Reputation: 862406
Use DataFrame.ne
or DataFrame.eq
for comparing with DataFrame.any
or DataFrame.all
for test at least one True
or all True
s per rows, DataFrame.iloc
is used for select first column:
matched = df.ne(df.iloc[:, 0], axis=0).any(axis=1)
df = df.loc[matched]
print (df)
20190508-131122 20190508-151756 20190509-074713
column_name
a_code 50.0 51.0 50.0
Or:
matched = df.eq(df.iloc[:, 0], axis=0).all(axis=1)
df = df.loc[~matched]
Upvotes: 2