shiva.n404
shiva.n404

Reputation: 478

Error while exploding a struct column in Spark

I have a dataframe whose schema looks like this:

event: struct (nullable = true)
|    | event_category: string (nullable = true)
|    | event_name: string (nullable = true)
|    | properties: struct (nullable = true)
|    |    | ErrorCode: string (nullable = true)
|    |    | ErrorDescription: string (nullable = true)

I am trying to explode the struct column properties using the following code:

df_json.withColumn("event_properties", explode($"event.properties"))

But it is throwing the following exception:

cannot resolve 'explode(`event`.`properties`)' due to data type mismatch: 
input to function explode should be array or map type, 
not StructType(StructField(IDFA,StringType,true),

How to explode the column properties?

Upvotes: 15

Views: 32206

Answers (3)

Ramesh Maharjan
Ramesh Maharjan

Reputation: 41957

You can use explode in an array or map columns so you need to convert the properties struct to array and then apply the explode function as below

import org.apache.spark.sql.functions._
df_json.withColumn("event_properties", explode(array($"event.properties.*"))).show(false)

You should have your desired requirement

Upvotes: 12

Anurag Sharma
Anurag Sharma

Reputation: 2595

You may use following to flatten the struct. Explode does not work for struct as error message states.

val explodeDF = parquetDF.explode($"event") { 
case Row(properties: Seq[Row]) => properties.map{ property =>
  val errorCode = property(0).asInstanceOf[String]
  val errorDescription = property(1).asInstanceOf[String]
  Event(errorCode, errorDescription, email, salary)
 }
}.cache()
display(explodeDF)

Upvotes: 0

Raphael Roth
Raphael Roth

Reputation: 27373

as the error message says, you can only explode array or map types, not struct type columns.

You can just do

df_json.withColumn("event_properties", $"event.properties")

This will generate a new column event_properties, which is also of struct-type

If you want to convert every element of the struct to a new column, then you cannot use withColumn, you need to do a select with a wildcard *:

df_json.select($"event.properties.*")

Upvotes: 8

Related Questions