Error while exploding a struct column in Spark

I have a dataframe whose schema looks like this:

event: struct (nullable = true)
|    | event_category: string (nullable = true)
|    | event_name: string (nullable = true)
|    | properties: struct (nullable = true)
|    |    | ErrorCode: string (nullable = true)
|    |    | ErrorDescription: string (nullable = true)

I am trying to explode the struct column properties using the following code:

df_json.withColumn("event_properties", explode($"event.properties"))

But it is throwing the following exception:

cannot resolve 'explode(`event`.`properties`)' due to data type mismatch: 
input to function explode should be array or map type, 
not StructType(StructField(IDFA,StringType,true),

How to explode the column properties?

Upvotes: 15

Answers (3)

Ramesh Maharjan

Reputation: 41987

You can use explode in an array or map columns so you need to convert the properties struct to array and then apply the explode function as below

import org.apache.spark.sql.functions._
df_json.withColumn("event_properties", explode(array($"event.properties.*"))).show(false)

You should have your desired requirement

Upvotes: 12

Anurag Sharma

Reputation: 2605

You may use following to flatten the struct. Explode does not work for struct as error message states.

val explodeDF = parquetDF.explode($"event") { 
case Row(properties: Seq[Row]) => properties.map{ property =>
  val errorCode = property(0).asInstanceOf[String]
  val errorDescription = property(1).asInstanceOf[String]
  Event(errorCode, errorDescription, email, salary)
 }
}.cache()
display(explodeDF)

Upvotes: 0

Raphael Roth

Reputation: 27383

as the error message says, you can only explode array or map types, not struct type columns.

You can just do

df_json.withColumn("event_properties", $"event.properties")

This will generate a new column event_properties, which is also of struct-type

If you want to convert every element of the struct to a new column, then you cannot use withColumn, you need to do a select with a wildcard *:

df_json.select($"event.properties.*")

Upvotes: 8

Error while exploding a struct column in Spark

Answers (3)

Related Questions