Reputation: 177
My data looks like :
[null,223433,WrappedArray(),null,460036382,0,home,home,home]
How do I check if the col3 is empty on query in spark sql ? I tried to explode but when I do that the empty array rows are disappearing. Can some suggest me a way to do this.
I tried :
val homeSet = result.withColumn("subscriptionProvider", explode($"subscriptionProvider"))
where subscriptionProvider(WrappedArray())
is the column having array of values but some arrays can be empty. I need to get the subscriptionProvider with null values and subscriptionProvider array has "Comcast"
Upvotes: 4
Views: 20107
Reputation: 436
How to check isEmpty on Column Data Spark scala
size($"ArrayColumn") === 0
Upvotes: 0
Reputation: 67075
LostInOverflow's answer is good for keeping in the dataframe mindset. However it depends on the size of your lists as to whether size
is efficient. If you are going to have large lists, then dropping out and back into the dataframe might be best:
val dfSchema = df.schema
val filtered = df.rdd.filter(!_.getList[String](2).isEmpty)
sqlContext.createDataFrame(filtered, dfSchema)
Upvotes: 2
Reputation:
Try:
import org.apache.spark.sql.functions._
val tmp = df.withColumn("subscriptionProvider",
when(size($"subscriptionProvider") !== 0, $"subscriptionProvider").otherwise(array(lit(null).cast("string"))))
tmp.withColumn("subscriptionProvider", explode($"subscriptionProvider"))
Upvotes: 9