Reputation: 465
I'm working with a dataframe
df.printSchema()
root
|-- key_value: struct (nullable = true)
| |-- key: string (nullable = true)
| |-- value: string (nullable = true)
df.show(5)
|key_value
|[k1,v1]
|[k1,v2]
|[k2,v3
|[k3,v6]
|[k4,v5]
I want to get the number of distinct keys in My dataframe, so I try to construct a dataframe that contains a column key and value using explode but I didn't get a result.
val f=df.withColumn("k",explode(col("key_value")))
org.apache.spark.sql.AnalysisException: cannot resolve 'explode(`key_value`)' due to data type mismatch: input to function explode should be array or map type, not StructType(StructField(key,StringType,true), StructField(value,StringType,true));;
any help?
Upvotes: 0
Views: 60
Reputation: 3863
You could do this
import spark.implicits._
df.select($"key_value.key").distinct.count
the explode
function is applied on array fields, in this case neither key_value
or key
are an array.
Upvotes: 1