David Serero
David Serero

Reputation: 134

Finding Column and Index in pandas dataframe

I have a pandas dataframe:

  col1 | col2 | col3 | col4 |
0.  A  | B    |    C |     G|
1.  I  | J    |    S |     D|
2.  O  | L    |    C |     G|
3.  A  | B    |    H |     D|
4.  H  | B    |    C |     P|

# reproducible
import pandas as pd
from string import ascii_uppercase as uc  # just for sample data
import random  # just for sample data

random.seed(365)
df = pd.DataFrame({'col1': [random.choice(uc) for _ in range(20)],
                   'col2': [random.choice(uc) for _ in range(20)],
                   'col3': [random.choice(uc) for _ in range(20)],
                   'col4': [random.choice(uc) for _ in range(20)]})

I'm looking for a function like this:

func('H')

which will return all the names of indexes and columns where "H" is. Any ideas?

Upvotes: 2

Views: 62

Answers (3)

Scott Boston
Scott Boston

Reputation: 153500

Use np.where and indexing (updated to add performance):

r, c = np.where(df.to_numpy() == 'H')
list(zip(df.index[r], df.columns[c]))

Output:

[(3, 'col3'), (4, 'col1')]

Upvotes: 2

Shubham Sharma
Shubham Sharma

Reputation: 71687

Use, np.argwhere along with df.to_numpy:

rows, cols = np.argwhere(df.to_numpy() == 'H').T
indices = list(zip(df.index[rows], df.columns[cols]))

Or,

indices = df.where(df.eq('H')).stack().index.tolist()

# print(indices)
[(3, 'col3'), (4, 'col1')]

timeit comparision of all the answers:

df.shape
(50000, 4)

%%timeit -n100 @Shubham1
rows, cols = np.argwhere(df.to_numpy() == 'H').T
indices = list(zip(df.index[rows], df.columns[cols])) 
8.87 ms ± 218 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


%%timeit -n100 @Scott
r,c = np.where(df == 'H')
_ = list(zip(df.index[r], df.columns[c])) 
17.4 ms ± 510 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


%%timeit -n100 @Shubham2
indices = df.where(df.eq('H')).stack().index.tolist()
26.8 ms ± 165 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


%%timeit -n100 @Roy
df.index.name = "inx"
t = df.reset_index().melt(id_vars = "inx")
_ = t[t.value == "H"]
29 ms ± 656 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Upvotes: 3

Roy2012
Roy2012

Reputation: 12523

One solution would be to use melt:

df.index.name = "inx"
t = df.reset_index().melt(id_vars = "inx")
print(t[t.value == "H"])

The output is:

    inx variable value
4     4     col1     H
13    3     col3     H

You can now easily extract columns and indices.

Upvotes: 2

Related Questions