Reputation: 2495
I know it's quite straightforward to use df.str.contains()
to check if the column contains a certain substring.
What if I want to do the other way around: check if the column's value is contained by a longer string? I did a search but couldn't find an answer. I thought this should be easy, like in pure python we could simply 'a' in 'abc'
I tried to use df.isin
but seems it's not designed for this purpose.
Say I have a df looks like this:
col1 col2
0 'apple' 'one'
1 'orange' 'two'
2 'banana' 'three'
I want to query this df on col1
if is contained by a string appleorangefruits
, it should return me the first two rows.
Upvotes: 3
Views: 1634
Reputation: 8826
try..
>>> df[df.col1.apply(lambda x: x in 'appleorangefruits')]
col1 col2
0 apple one
1 orange two
Upvotes: 1
Reputation: 8631
You need:
longstring = 'appleorangefruits'
df.loc[df['col1'].apply(lambda x: x in longstring)]
Output:
col1 col2
0 apple one
1 orange two
Upvotes: 3
Reputation: 4744
As apply
is notoriously slow I thought I'd have a play with some other ideas.
If your "long_string" is relatively short and your DataFrame is massive, you could do something weird like this.
from itertools import combinations
from random import choice
# Create a large DataFrame
df = pd.DataFrame(
data={'test' : [choice('abcdef') for i in range(10_000_000)]}
)
long_string = 'abcdnmlopqrtuvqwertyuiop'
def get_all_substrings(input_string):
length = len(input_string)
return [input_string[i:j + 1] for i in range(length) for j in range(i,length)]
sub_strings = get_all_substrings(long_string)
df.test.isin(sub_strings)
This ran in about 300ms vs 2.89s for the above apply(lambda a: a in 'longer string')
answers. This is ten times quicker!
Note: I used the get_all_substrings
functions from How To Get All The Contiguous Substrings Of A String In Python?
Upvotes: 4
Reputation: 76
You can call an apply on the column, i.e.:
df['your col'].apply(lambda a: a in 'longer string')
Upvotes: 4
Reputation: 491
If the string you are checking against is a constant, I believe you can achieve it by using DataFrame.apply
:
df.apply(lambda row: row['mycol'] in 'mystring', axis=1)
Upvotes: 2