Reputation: 478
I have a dataframe whose schema looks like this:
event: struct (nullable = true)
| | event_category: string (nullable = true)
| | event_name: string (nullable = true)
| | properties: struct (nullable = true)
| | | ErrorCode: string (nullable = true)
| | | ErrorDescription: string (nullable = true)
I am trying to explode the struct
column properties
using the following code:
df_json.withColumn("event_properties", explode($"event.properties"))
But it is throwing the following exception:
cannot resolve 'explode(`event`.`properties`)' due to data type mismatch: input to function explode should be array or map type, not StructType(StructField(IDFA,StringType,true),
How to explode the column properties
?
Upvotes: 15
Views: 32206
Reputation: 41957
You can use explode
in an array
or map
columns so you need to convert the properties
struct
to array
and then apply the explode
function as below
import org.apache.spark.sql.functions._
df_json.withColumn("event_properties", explode(array($"event.properties.*"))).show(false)
You should have your desired requirement
Upvotes: 12
Reputation: 2595
You may use following to flatten the struct. Explode does not work for struct as error message states.
val explodeDF = parquetDF.explode($"event") {
case Row(properties: Seq[Row]) => properties.map{ property =>
val errorCode = property(0).asInstanceOf[String]
val errorDescription = property(1).asInstanceOf[String]
Event(errorCode, errorDescription, email, salary)
}
}.cache()
display(explodeDF)
Upvotes: 0
Reputation: 27373
as the error message says, you can only explode array or map types, not struct type columns.
You can just do
df_json.withColumn("event_properties", $"event.properties")
This will generate a new column event_properties
, which is also of struct-type
If you want to convert every element of the struct to a new column, then you cannot use withColumn
, you need to do a select
with a wildcard *
:
df_json.select($"event.properties.*")
Upvotes: 8