Reputation: 41
Is it possible to perform a dynamic "where/filter" in a dataframe ? I am running a "like" operation to remove items that match specific strings
eventsDF.where(
~eventsDF.myColumn.like('FirstString%') &
~eventsDF.myColumn.like('anotherString%')
).count()
However I need to filter based on strings that come from another dataframe/list.
The solution that I was going for (which doesn't really work) involves a function that receives an index
#my_func[0] = "FirstString"
#my_func[1] = "anotherString"
def my_func(n):
return str(item[n])
newDf.where(
~newDf.useragent.like(str(my_func(1))+'%')
).count()
but I'm struggling to make it work by passing a range (mainly because it's a list instead of an integer)
newDf.where(
~newDf.useragent.like(str(my_func([i for i in range(2)])+'%'))
).count()
I don't want to go down the path of using "exec" or "eval" to perform it
Upvotes: 2
Views: 1043
Reputation: 544
str_likes = [~df.column.like(s) for s in strings]
then reduce it into one expression reduce(lambda x, y: x & y, str_likes)
It's a little bit ugly but does what you want. You can also do this in a for loop like so
bool_expr = ~df.column.like(strings[0])
for s in strings[1:]:
bool_expr &= ~df.column.like(s)
df.where(bool_expr).count()
Upvotes: 3