Reputation: 77

How to find which column contains a certain value?

I have a dataframe like this:

test = pd.DataFrame({"id":[1,2,3,4],
                     "name_1":["peter","bobby","alex","chris"],
                     "name_1_flag":["real","fake","fake","real"],
                     "name_2":["hector","abi","henrik","miko"],
                     "name_2_flag":["fake","real","fake","fake"],
                     "name_3":["hans","khan","will","than"],
                     "name_3_flag":["fake","fake","real","fake"]})

   id name_1 name_1_flag  name_2 name_2_flag name_3 name_3_flag
0   1  peter        real  hector        fake   hans        fake
1   2  bobby        fake     abi        real   khan        fake
2   3   alex        fake  henrik        fake   will        real
3   4  chris        real    miko        fake   than        fake

How can I find the row/column tuple that has the word "real" in it.

Optimally the output would be an array or series like this:

    col_index
0           3
1           5
2           7
3           3

Upvotes: 4

Answers (4)

Henry Yik

Reputation: 22503

You can also use dot:

print (test.eq("real").dot(range(test.columns.size))+1)

0    3
1    5
2    7
3    3
dtype: int32

Upvotes: 1

BENY

Reputation: 323366

Let us try with

s = test.where(lambda x : x=='real').stack()
test['new'] = test.columns.get_indexer(s.index.get_level_values(1))+1
test
Out[11]: 
   id name_1 name_1_flag  name_2 name_2_flag name_3 name_3_flag  new
0   1  peter        real  hector        fake   hans        fake    3
1   2  bobby        fake     abi        real   khan        fake    5
2   3   alex        fake  henrik        fake   will        real    7
3   4  chris        real    miko        fake   than        fake    3

Upvotes: 1

U13-Forward

Reputation: 71610

Solutions:

Try np.argmax:

>>> np.argmax(test.eq('real').to_numpy(), axis=1) + 1
array([3, 5, 7, 3], dtype=int64)
>>>

--- Or get_indexer:

test.columns.get_indexer(test.eq('real').idxmax(axis=1)) + 1

Or .T.reset_index(drop=True):

test.T.reset_index(drop=True).eq('real').idxmax() + 1

Making it a column:

np.argmax:

test["col_index"] = np.argmax(test.eq('real').to_numpy(), axis=1) + 1

The one with get_indexer:

test["col_index"] = test.columns.get_indexer(test.eq('real').idxmax(axis=1)) + 1

.T:

test["col_index"] = test.T.reset_index(drop=True).eq('real').idxmax() + 1

All Output:

   id name_1 name_1_flag  name_2 name_2_flag name_3 name_3_flag  col_index
0   1  peter        real  hector        fake   hans        fake          3
1   2  bobby        fake     abi        real   khan        fake          5
2   3   alex        fake  henrik        fake   will        real          7
3   4  chris        real    miko        fake   than        fake          3

Upvotes: 3

Dani Mesejo

Reputation: 61930

Use np.where:

test["col_index"] = np.where(test.eq("real"))[1] + 1
print(test)

Output

   id name_1 name_1_flag  name_2 name_2_flag name_3 name_3_flag  col_index
0   1  peter        real  hector        fake   hans        fake          3
1   2  bobby        fake     abi        real   khan        fake          5
2   3   alex        fake  henrik        fake   will        real          7
3   4  chris        real    miko        fake   than        fake          3

Upvotes: 4

How to find which column contains a certain value?

Answers (4)

Solutions:

Making it a column:

Related Questions