Swetha
Swetha

Reputation: 177

How to check isEmpty on Column Data Spark scala

My data looks like :

[null,223433,WrappedArray(),null,460036382,0,home,home,home]

How do I check if the col3 is empty on query in spark sql ? I tried to explode but when I do that the empty array rows are disappearing. Can some suggest me a way to do this.

I tried :

val homeSet = result.withColumn("subscriptionProvider", explode($"subscriptionProvider"))

where subscriptionProvider(WrappedArray()) is the column having array of values but some arrays can be empty. I need to get the subscriptionProvider with null values and subscriptionProvider array has "Comcast"

Upvotes: 4

Views: 20107

Answers (3)

Dedkov Vadim
Dedkov Vadim

Reputation: 436

How to check isEmpty on Column Data Spark scala

size($"ArrayColumn") === 0

Upvotes: 0

Justin Pihony
Justin Pihony

Reputation: 67075

LostInOverflow's answer is good for keeping in the dataframe mindset. However it depends on the size of your lists as to whether size is efficient. If you are going to have large lists, then dropping out and back into the dataframe might be best:

val dfSchema = df.schema
val filtered = df.rdd.filter(!_.getList[String](2).isEmpty)
sqlContext.createDataFrame(filtered, dfSchema)

Upvotes: 2

user6022341
user6022341

Reputation:

Try:

import org.apache.spark.sql.functions._

val tmp = df.withColumn("subscriptionProvider", 
  when(size($"subscriptionProvider") !== 0, $"subscriptionProvider").otherwise(array(lit(null).cast("string"))))

tmp.withColumn("subscriptionProvider", explode($"subscriptionProvider"))

Upvotes: 9

Related Questions