Reputation: 237
I want to filter a pyspark dataframe if any of the string columns in a list are empty.
df = df.where(all([col(x)!='' for x in col_list]))
ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions.
Upvotes: 4
Views: 233
Reputation: 1242
Since filter
(or where
) are lazy evaluated transformation we can merge multiple conditions by applying them one by one, e.g.
for c in col_list:
spark_df = spark_df.filter(col(c) != "")
spark_df.show()
Which may be a bit more readable, but in the end it will be executed in a completely same way as Sreeram's answer.
On a side note, removing rows with empty values would be most often done with
df.na.drop(how="any", subset=col_list)
but it only handles missing (null / None) values, not empty strings.
Upvotes: 0
Reputation: 11917
You can use reduce
from functools
to simulate all
like this
from functools import reduce
spark_df.where(reduce(lambda x, y: x & y, (F.col(x) != '' for x in col_list))).show()
Upvotes: 2