Thomas Murphy
Thomas Murphy

Reputation: 1468

pandas - compare N columns and output max(equal columns)

Okay - can't quite pandas-foo my way through this one.

I have N (let's say 4 for this purpose) data sources about the same data, and I want to know the maximum number of data sources that have the equal value per row is, as well as that value.

So a sample input would be:

source_1 source_2 source_3 source_4
100      100      98       100

and I want to add two columns to my dataframe, max_sources = 3 and max_value = 100.

I can do this with a good old fashioned hash map but figure there must be a way to pull it off with pandas - equals and compare are 1:1 but the right general idea.

Upvotes: 2

Views: 43

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150825

Try:

# identify your sources
source_cols = ['source1', 'source2', 'source3', 'source4']

max_vals = df[source_cols].max(1)

df['max_sources'] = df[source_cols].eq(max_vals, axis=0).sum(1)
df['max_value'] = max_vals

Upvotes: 1

Related Questions