Mamaf
Mamaf

Reputation: 360

Using Spark DataFrame filter with a List of column names

I have to filter non-null column values in a Spark DataFrame using a List[String]:

val keyList = List("columnA", "columnB", "columnC", "columnD", ...)

For a single column named key, the syntax should be :

val nonNullDf = df.filter(col("key").isNotNull)

My question is how to use the keyList into the previous filter?

Upvotes: 1

Views: 1050

Answers (1)

mck
mck

Reputation: 42422

You can generate a filter by doing a map-reduce on keyList.

Use and if you want to keep the rows where all columns are not null, or use or if you want to keep the rows where any column is not null.

val nonNullDf = df.filter(keyList.map(col(_).isNotNull).reduce(_ and _))

Upvotes: 1

Related Questions