harshit
harshit

Reputation: 3846

How to filter dataframe rows if column value (string) contains any of the values in a set in python?

I want to filter rows if cell string contains anyone of the values in the predefined set.

For example, for following dataframe:

   ids ids2  vals
0  a h  a i     1
1  b z  n a     2
2  f z  c a     3
3  n i  n h     4

I want following rows extracted (the rows which have 'h' or 'i' in the ids column):

   ids ids2  vals
0  a h  a i     1
3  n i  n h     4

Code to generate dataframe:

d = pd.DataFrame({'vals': [1, 2, 3, 4], 'ids': ['a h', 'b z', 'f z', 'n i'],'ids2': ['a i', 'n a', 'c a', 'n h']})

What I have done so far:

d[d['ids'].str.contains('h')|d['ids'].str.contains('i')]

Here the predefined set is small and contains is case sensitive. Is there a way I can do this either with case-insensitivity or using some list contains method. I tried doing this:

d[len(re.findall('h|i',d['ids'].str,re.IGNORECASE)) > 0]

but it gives me TypeError: expected string or bytes-like object.

or this:

data[any(d['name'].str.contains(x) for x in ['h','i'])]

gives error: KeyError: 'name' Can someone help me with this?

Upvotes: 1

Views: 4342

Answers (2)

EdChum
EdChum

Reputation: 394031

You can do this easily by passing a regex that joins the terms:

In [132]:
d[~d['ids'].str.contains('h|i', case=False)]

Out[132]:
   ids ids2  vals
1  b z  n a     2
2  f z  c a     3

Upvotes: 1

Greg Friedman
Greg Friedman

Reputation: 341

Use case = False to make it case-insensitive:

d[d['ids'].str.contains('h', case=False)|d['ids'].str.contains('i',case=False)]

This is definitely a little roundabout but it will work:

letters = ['h', 'i']
d[d['ids'].str.split().apply(lambda x: len(set(x).intersection(set(letters))))>0]

Upvotes: 2

Related Questions