Reputation: 51
I have a Dataframe with the following schema, where 'name' is a string type and the value is a complex JSON with Array and struct.
Basically with string datatype i couldn't able to parse the data and write into rows. So I am trying to convert datatype and apply explode to parse the data.
Current:
root
|--id: string (nullable = true)
|--partitionNo: string (nullable = true)
|--name: string (nullable = true)
After conversion:
Expected:
root
|id: string (nullable = true)
|partitionNo: string (nullable = true)
|name: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- extension: array (nullable = true)
| | | |-- element: struct (containsNull = true)
| | | | |-- url: string (nullable = true)
| | | | |-- valueMetadata: struct (nullable = true)
| | | | |-- modifiedDateTime: string (nullable = true)
| | | | |-- code: string (nullable = true)
| | |-- lastName: string (nullable = true)
| | |-- firstName: array (nullable = true)
| | | |-- element: string (containsNull = true)
Upvotes: 0
Views: 474
Reputation: 42332
You can use from_json
, but you need to provide a schema, which can be automatically inferred using a spaghetti code... because from_json
only accepts a schema in the form of lit
:
val df2 = df.withColumn(
"name",
from_json(
$"name",
// the lines below generate the schema
lit(
df.select(
schema_of_json(
lit(
df.select($"name").head()(0)
)
)
).head()(0)
)
// end of schema generation
)
)
Upvotes: 1