How to add new field to two levels nested struct column

Question

I have a data frame with schema like below

 root
     |-- ts: timestamp (nullable = true)
     |-- address_list: array (nullable = true)
     |    |-- element: struct (containsNull = true)
     |    |    |-- id: string (nullable = true)
     |    |    |-- active: integer (nullable = true)
     |    |    |-- address: array (nullable = true)
     |    |    |    |-- element: struct (containsNull = true)
     |    |    |    |    |-- street: string (nullable = true)
     |    |    |    |    |-- city: long (nullable = true)
     |    |    |    |    |-- state: integer (nullable = true)

Would like to add a new field street_2 to one of its nested column - address_list.address in between street and city.

Below is the expected schema

 root
     |-- ts: timestamp (nullable = true)
     |-- address_list: array (nullable = true)
     |    |-- element: struct (containsNull = true)
     |    |    |-- id: string (nullable = true)
     |    |    |-- active: integer (nullable = true)
     |    |    |-- address: array (nullable = true)
     |    |    |    |-- element: struct (containsNull = true)
     |    |    |    |    |-- street: string (nullable = true)
     |    |    |    |    |-- street_2: string (nullable = true)
     |    |    |    |    |-- city: long (nullable = true)
     |    |    |    |    |-- state: integer (nullable = true)

I did try using transform but that adds the street_2 field to address_list at the end

df
.withColumn("address_list",transform(col("address_list"), x => x.withField("street_2", lit(null).cast(string))))

 root
     |-- ts: timestamp (nullable = true)
     |-- address_list: array (nullable = true)
     |    |-- element: struct (containsNull = true)
     |    |    |-- id: string (nullable = true)
     |    |    |-- active: integer (nullable = true)
     |    |    |-- address: array (nullable = true)
     |    |    |    |-- element: struct (containsNull = true)
     |    |    |    |    |-- street: string (nullable = true)
     |    |    |    |    |-- city: long (nullable = true)
     |    |    |    |    |-- state: integer (nullable = true)
     |    |    |-- street_2: string (nullable = true)

where as I want it inside address, and inserted between street and city

How to add new field to two levels nested struct column

Answers (1)

Related Questions