Reputation: 6697
I am having trouble processing JSON data in Spark.
The DataFrame has a column that has JSON in String format.
DF Schema:
root
|-- id: string (nullable = true)
|-- jsonString: string (nullable = true)
Sample jsonString:"{\"sample\":\"value\"}";
I want to convert this jsonString as a nested JSON object. This enables to read and traverse through the JSON data.
Target DF structure I am looking for is as follows.
root
|-- id: string (nullable = true)
|-- json: struct (nullable = true)
| |-- sample: string (nullable = true)
Appreciate any help.
Upvotes: 2
Views: 5013
Reputation: 23119
You can use the to_json
function to convert the jsonString. For this, you need to create a schema
//dummy data
val data = Seq(
("a", "{\"sample\":\"value1\"}"),
("b", "{\"sample\":\"value2\"}"),
("c", "{\"sample\":\"value3\"}")
).toDF("id", "jsonString")
//create schema for jsonString
val schema = StructType(StructField("sample", StringType, true):: Nil)
//create new column with from_json using schema
data.withColumn("newCol", from_json($"jsonString", schema))
Output Schema:
root
|-- id: string (nullable = true)
|-- jsonString: string (nullable = true)
|-- newCol: struct (nullable = true)
| |-- sample: string (nullable = true)
Output:
+---+-------------------+--------+
|id |jsonString |newCol |
+---+-------------------+--------+
|a |{"sample":"value1"}|[value1]|
|b |{"sample":"value2"}|[value2]|
|c |{"sample":"value3"}|[value3]|
+---+-------------------+--------+
Hope this helps!
Upvotes: 2