Update nested struct in spark dataset from another struct column

Question

I have the following spark dataset with a nested struct type:

-- _1: struct (nullable = false)
 |    |-- _1: struct (nullable = false)
 |    |    |-- _1: struct (nullable = false)
 |    |    |    |-- ni_number: string (nullable = true)
 |    |    |    |-- national_registration_number: string (nullable = true)
 |    |    |    |-- id_issuing_country: string (nullable = true)
 |    |    |    |-- doc_type_name: string (nullable = true)
 |    |    |    |-- brand: string (nullable = true)
 |    |    |    |-- company_name: string (nullable = true)
 |    |    |-- _2: struct (nullable = true)
 |    |    |    |-- municipality: string (nullable = true)
 |    |    |    |-- country: string (nullable = true)
 |    |-- _2: struct (nullable = true)
 |    |    |-- brand_name: string (nullable = true)
 |    |    |-- puk: string (nullable = true)
 |-- _2: struct (nullable = true)
 |    |-- customer_servicesegment: string (nullable = true)
 |    |-- customer_category: string (nullable = true)

my aim here is to do some flattening at the bottom of the structype and have this target schema:

-- _1: struct (nullable = false)
|    |-- _1: struct (nullable = false)
|    |    |-- _1: struct (nullable = false)
|    |    |    |-- ni_number: string (nullable = true)
|    |    |    |-- national_registration_number: string (nullable = true)
|    |    |    |-- id_issuing_country: string (nullable = true)
|    |    |    |-- doc_type_name: string (nullable = true)
|    |    |    |-- brand: string (nullable = true)
|    |    |    |-- company_name: string (nullable = true)
|    |    |-- _2: struct (nullable = true)
|    |    |    |-- municipality: string (nullable = true)
|    |    |    |-- country: string (nullable = true)
|    |-- _2: struct (nullable = true)
|    |    |-- brand_name: string (nullable = true)
|    |    |-- puk: string (nullable = true)
|    |-- _3: struct (nullable = true)
|    |    |-- customer_servicesegment: string (nullable = true)
|    |    |-- customer_category: string (nullable = true)

the part of the schema with the columns (customer_servicesegment, customer_category) should be at the same level as the one with the cols (brand_name, puk)

So here explode utility from spark sql can be used but I don't know where to put it

any help with this please

Update nested struct in spark dataset from another struct column

Answers (1)

Related Questions