Reputation: 83
So basically I want to create a function that takes in a bunch of strings, checks if a particular column has that string then returns a boolean expression. I can easily do this with a single string. But I'm stumped on how to do it as a list of strings.
# Single String Example
def mask(x, df):
return df.description.str.contains(x)
df[mask('sql')]
# Some kind of example of what I want
def mask(x, df):
return df.description.str.contains(x[0]) & df.description.str.contains(x[1]) & df.description.str.contains(x[2]) & ...
df[mask(['sql'])]
Any help would be appreciated :)
So it looks like I figured out a way to do it, little unorthodox but seems to be working anyway. Solution below
def mask(x):
X = np.prod([df.description.str.contains(i) for i in x], axis = 0)
return [True if i == 1 else False for i in X]
my_selection = df[mask(['sql', 'python'], df)]
Upvotes: 2
Views: 84
Reputation: 83
Managed to work out a solution here:
def mask(x):
X = np.prod([df.description.str.contains(i) for i in x], axis = 0)
return [True if i == 1 else False for i in X]
mine = df[mask(['sql', 'python'], df)]
A little unorthodox so if anyone has anything better will be appreciated
Upvotes: 0
Reputation: 71610
Try using:
def mask(x, df):
return df.description.str.contains(''.join(map('(?=.*%s)'.__mod__, x)))
df[mask(['a', 'b'], df)]
The (?=.*<word>)
one after another is really an and operator.
Upvotes: 1