Reputation: 2221
For checking if a single string is contained in rows of one column. (for example, "abc"
is contained in "abcdef"
), the following code is useful:
df_filtered = df.filter(df.columnName.contains('abc'))
The result would be for example "_wordabc","thisabce","2abc1"
.
How can I check for multiple strings (for example ['ab1','cd2','ef3']
) at the same time?
I'm ideally searching for something like this:
df_filtered = df.filter(df.columnName.contains(['word1','word2','word3']))
The result would be for example "x_ab1","_cd2_","abef3"
.
Please, post scalable solutions (no for loops, for example) because the aim is to check a big list around 1000 elements.
Upvotes: 0
Views: 1864
Reputation: 5480
All you need is isin
df_filtered = df.filter(df['columnName'].isin('word1','word2','word3')
Edit
You need rlike
function to achieve your result
words="(aaa|bbb|ccc)"
df.filter(df['columnName'].rlike(words))
Upvotes: 2