Reputation: 1293
This question has been asked but I didn't find the answers complete. I have a dataframe that has unnecessary values in the first row and I want to find the row index of the animals:
df = pd.DataFrame({'a':['apple','rhino','gray','horn'],
'b':['honey','elephant', 'gray','trunk'],
'c':['cheese','lion', 'beige','mane']})
a b c
0 apple honey cheese
1 rhino elephant lion
2 gray gray beige
3 horn trunk mane
ani_pat = r"rhino|zebra|lion"
That means I want to find "1" - the row index that matches the pattern. One solution I saw here was like this; applying to my problem...
def findIdx(df, pattern):
return df.apply(lambda x: x.str.match(pattern, flags=re.IGNORECASE)).values.nonzero()
animal = findIdx(df, ani_pat)
print(animal)
(array([1, 1], dtype=int64), array([0, 2], dtype=int64))
That output is a tuple of NumPy arrays. I've got the basics of NumPy and Pandas, but I'm not sure what to do with this or how it relates to the df above.
I altered that lambda expression like this:
df.apply(lambda x: x.str.match(ani_pat, flags=re.IGNORECASE))
a b c
0 False False False
1 True False True
2 False False False
3 False False False
That makes a little more sense. but still trying to get the row index of the True values. How can I do that?
Upvotes: 0
Views: 43
Reputation: 35676
We can select from the filter the DataFrame index
where there are rows that have any
True value in them:
idx = df.index[
df.apply(lambda x: x.str.match(ani_pat, flags=re.IGNORECASE)).any(axis=1)
]
idx
:
Int64Index([1], dtype='int64')
any
on axis 1 will take the boolean DataFrame and reduce it to a single dimension based on the contents of the rows.
Before any
:
a b c
0 False False False
1 True False True
2 False False False
3 False False False
After any
:
0 False
1 True
2 False
3 False
dtype: bool
We can then use these boolean values as a mask for index
(selecting indexes which have a True value):
Int64Index([1], dtype='int64')
If needed we can use tolist
to get a list instead:
idx = df.index[
df.apply(lambda x: x.str.match(ani_pat, flags=re.IGNORECASE)).any(axis=1)
].tolist()
idx
:
[1]
Upvotes: 1