Reputation: 3846
I want to filter rows if cell string contains anyone of the values in the predefined set.
For example, for following dataframe:
ids ids2 vals
0 a h a i 1
1 b z n a 2
2 f z c a 3
3 n i n h 4
I want following rows extracted (the rows which have 'h' or 'i' in the ids column):
ids ids2 vals
0 a h a i 1
3 n i n h 4
Code to generate dataframe:
d = pd.DataFrame({'vals': [1, 2, 3, 4], 'ids': ['a h', 'b z', 'f z', 'n i'],'ids2': ['a i', 'n a', 'c a', 'n h']})
What I have done so far:
d[d['ids'].str.contains('h')|d['ids'].str.contains('i')]
Here the predefined set is small and contains is case sensitive. Is there a way I can do this either with case-insensitivity or using some list contains method. I tried doing this:
d[len(re.findall('h|i',d['ids'].str,re.IGNORECASE)) > 0]
but it gives me TypeError: expected string or bytes-like object
.
or this:
data[any(d['name'].str.contains(x) for x in ['h','i'])]
gives error: KeyError: 'name'
Can someone help me with this?
Upvotes: 1
Views: 4342
Reputation: 394031
You can do this easily by passing a regex that joins the terms:
In [132]:
d[~d['ids'].str.contains('h|i', case=False)]
Out[132]:
ids ids2 vals
1 b z n a 2
2 f z c a 3
Upvotes: 1
Reputation: 341
Use case = False
to make it case-insensitive:
d[d['ids'].str.contains('h', case=False)|d['ids'].str.contains('i',case=False)]
This is definitely a little roundabout but it will work:
letters = ['h', 'i']
d[d['ids'].str.split().apply(lambda x: len(set(x).intersection(set(letters))))>0]
Upvotes: 2