Reputation: 134
I have a pandas dataframe:
col1 | col2 | col3 | col4 |
0. A | B | C | G|
1. I | J | S | D|
2. O | L | C | G|
3. A | B | H | D|
4. H | B | C | P|
# reproducible
import pandas as pd
from string import ascii_uppercase as uc # just for sample data
import random # just for sample data
random.seed(365)
df = pd.DataFrame({'col1': [random.choice(uc) for _ in range(20)],
'col2': [random.choice(uc) for _ in range(20)],
'col3': [random.choice(uc) for _ in range(20)],
'col4': [random.choice(uc) for _ in range(20)]})
I'm looking for a function like this:
func('H')
which will return all the names of indexes and columns where "H" is. Any ideas?
Upvotes: 2
Views: 62
Reputation: 153500
Use np.where and indexing (updated to add performance):
r, c = np.where(df.to_numpy() == 'H')
list(zip(df.index[r], df.columns[c]))
Output:
[(3, 'col3'), (4, 'col1')]
Upvotes: 2
Reputation: 71687
Use, np.argwhere
along with df.to_numpy
:
rows, cols = np.argwhere(df.to_numpy() == 'H').T
indices = list(zip(df.index[rows], df.columns[cols]))
Or,
indices = df.where(df.eq('H')).stack().index.tolist()
# print(indices)
[(3, 'col3'), (4, 'col1')]
timeit
comparision of all the answers:
df.shape
(50000, 4)
%%timeit -n100 @Shubham1
rows, cols = np.argwhere(df.to_numpy() == 'H').T
indices = list(zip(df.index[rows], df.columns[cols]))
8.87 ms ± 218 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit -n100 @Scott
r,c = np.where(df == 'H')
_ = list(zip(df.index[r], df.columns[c]))
17.4 ms ± 510 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit -n100 @Shubham2
indices = df.where(df.eq('H')).stack().index.tolist()
26.8 ms ± 165 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit -n100 @Roy
df.index.name = "inx"
t = df.reset_index().melt(id_vars = "inx")
_ = t[t.value == "H"]
29 ms ± 656 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Upvotes: 3
Reputation: 12523
One solution would be to use melt:
df.index.name = "inx"
t = df.reset_index().melt(id_vars = "inx")
print(t[t.value == "H"])
The output is:
inx variable value
4 4 col1 H
13 3 col3 H
You can now easily extract columns and indices.
Upvotes: 2