Rex
Rex

Reputation: 51

How to convert the dataframe column type from string to (array and struct) in spark

I have a Dataframe with the following schema, where 'name' is a string type and the value is a complex JSON with Array and struct.

Basically with string datatype i couldn't able to parse the data and write into rows. So I am trying to convert datatype and apply explode to parse the data.

Current:
root
|--id: string (nullable = true)
|--partitionNo: string (nullable = true)
|--name: string (nullable = true)

After conversion:

Expected:
root
|id: string (nullable = true)
|partitionNo: string (nullable = true)
|name: array (nullable = true)
|     |-- element: struct (containsNull = true) 
|     |    |-- extension: array (nullable = true)
|     |    |    |-- element: struct (containsNull = true)
|     |    |    |    |-- url: string (nullable = true)
|     |    |    |    |-- valueMetadata: struct (nullable = true)
|     |    |    |    |-- modifiedDateTime: string (nullable = true)
|     |    |    |    |-- code: string (nullable = true)
|     |    |-- lastName: string (nullable = true)
|     |    |-- firstName: array (nullable = true)
|     |    |    |-- element: string (containsNull = true)

Upvotes: 0

Views: 474

Answers (1)

mck
mck

Reputation: 42332

You can use from_json, but you need to provide a schema, which can be automatically inferred using a spaghetti code... because from_json only accepts a schema in the form of lit:

val df2 = df.withColumn(
    "name",
    from_json(
        $"name",
        // the lines below generate the schema
        lit(
            df.select(
                schema_of_json(
                    lit(
                        df.select($"name").head()(0)
                    )
                )
            ).head()(0)
        )
        // end of schema generation
    )
)

Upvotes: 1

Related Questions