Reputation: 1106
I have the following spark dataset with a nested struct type:
-- _1: struct (nullable = false)
| |-- _1: struct (nullable = false)
| | |-- _1: struct (nullable = false)
| | | |-- ni_number: string (nullable = true)
| | | |-- national_registration_number: string (nullable = true)
| | | |-- id_issuing_country: string (nullable = true)
| | | |-- doc_type_name: string (nullable = true)
| | | |-- brand: string (nullable = true)
| | | |-- company_name: string (nullable = true)
| | |-- _2: struct (nullable = true)
| | | |-- municipality: string (nullable = true)
| | | |-- country: string (nullable = true)
| |-- _2: struct (nullable = true)
| | |-- brand_name: string (nullable = true)
| | |-- puk: string (nullable = true)
|-- _2: struct (nullable = true)
| |-- customer_servicesegment: string (nullable = true)
| |-- customer_category: string (nullable = true)
my aim here is to do some flattening at the bottom of the structype and have this target schema:
-- _1: struct (nullable = false)
| |-- _1: struct (nullable = false)
| | |-- _1: struct (nullable = false)
| | | |-- ni_number: string (nullable = true)
| | | |-- national_registration_number: string (nullable = true)
| | | |-- id_issuing_country: string (nullable = true)
| | | |-- doc_type_name: string (nullable = true)
| | | |-- brand: string (nullable = true)
| | | |-- company_name: string (nullable = true)
| | |-- _2: struct (nullable = true)
| | | |-- municipality: string (nullable = true)
| | | |-- country: string (nullable = true)
| |-- _2: struct (nullable = true)
| | |-- brand_name: string (nullable = true)
| | |-- puk: string (nullable = true)
| |-- _3: struct (nullable = true)
| | |-- customer_servicesegment: string (nullable = true)
| | |-- customer_category: string (nullable = true)
the part of the schema with the columns (customer_servicesegment
, customer_category
) should be at the same level as the one with the cols (brand_name
, puk
)
So here explode utility from spark sql can be used but I don't know where to put it
any help with this please
Upvotes: 1
Views: 445
Reputation: 32710
If you have Spark 3.1+, you can use withField
column method to update the the struct _1
like this:
val df2 = df.withColumn("_1", col("_1").withField("_3", col("_2"))).drop("_2")
This adds the column _2
as new field named _3
into the struct _1
then drops the column _2
for first level.
For older versions, you need to reconstruct the struct column _1
:
val df2 = df.withColumn(
"_1",
struct(col("_1._1").as("_1"), col("_1._2").as("_2"), col("_2").as("_3"))
).drop("_2")
Upvotes: 1