user1760178
user1760178

Reputation: 6697

Spark Dataframe with JSON as String in it, to be cast as nested json

I am having trouble processing JSON data in Spark.

The DataFrame has a column that has JSON in String format.

DF Schema:

root
 |-- id: string (nullable = true)
 |-- jsonString: string (nullable = true)

Sample jsonString:"{\"sample\":\"value\"}";

I want to convert this jsonString as a nested JSON object. This enables to read and traverse through the JSON data.

Target DF structure I am looking for is as follows.

root
 |-- id: string (nullable = true)
 |-- json: struct (nullable = true)
 |   |-- sample: string (nullable = true)

Appreciate any help.

Upvotes: 2

Views: 5013

Answers (1)

koiralo
koiralo

Reputation: 23119

You can use the to_json function to convert the jsonString. For this, you need to create a schema

//dummy data 
val data = Seq(
  ("a", "{\"sample\":\"value1\"}"),
  ("b", "{\"sample\":\"value2\"}"),
  ("c", "{\"sample\":\"value3\"}")
).toDF("id", "jsonString")

//create schema for jsonString 

val schema = StructType(StructField("sample", StringType, true):: Nil)

//create new column with from_json using schema 
data.withColumn("newCol", from_json($"jsonString", schema))

Output Schema:

root
 |-- id: string (nullable = true)
 |-- jsonString: string (nullable = true)
 |-- newCol: struct (nullable = true)
 |    |-- sample: string (nullable = true)

Output:

+---+-------------------+--------+
|id |jsonString         |newCol  |
+---+-------------------+--------+
|a  |{"sample":"value1"}|[value1]|
|b  |{"sample":"value2"}|[value2]|
|c  |{"sample":"value3"}|[value3]|
+---+-------------------+--------+

Hope this helps!

Upvotes: 2

Related Questions