Reputation: 133
I have the following JSON objects:
{
"user_id": "123",
"data": {
"city": "New York"
},
"timestamp": "1563188698.31",
"session_id": "6a793439-6535-4162-b333-647a6761636b"
}
{
"user_id": "123",
"data": {
"name": "some_name",
"age": "23",
"occupation": "teacher"
},
"timestamp": "1563188698.31",
"session_id": "6a793439-6535-4162-b333-647a6761636b"
}
I'm using val df = sqlContext.read.json("json")
to read the file to dataframe
Which combines all data attributes into data struct like so:
root
|-- data: struct (nullable = true)
| |-- age: string (nullable = true)
| |-- city: string (nullable = true)
| |-- name: string (nullable = true)
| |-- occupation: string (nullable = true)
|-- session_id: string (nullable = true)
|-- timestamp: string (nullable = true)
|-- user_id: string (nullable = true)
Is it possible to transform data field to MAP[String, String] Data type? And so it only has the same attributes as original json?
Upvotes: 3
Views: 5853
Reputation: 7316
Yes you can achieve that by exporting a Map[String, String] from the JSON data as shown next:
import org.apache.spark.sql.types.{MapType, StringType}
import org.apache.spark.sql.functions.{to_json, from_json}
val jsonStr = """{
"user_id": "123",
"data": {
"name": "some_name",
"age": "23",
"occupation": "teacher"
},
"timestamp": "1563188698.31",
"session_id": "6a793439-6535-4162-b333-647a6761636b"
}"""
val df = spark.read.json(Seq(jsonStr).toDS)
val mappingSchema = MapType(StringType, StringType)
df.select(from_json(to_json($"data"), mappingSchema).as("map_data"))
//Output
// +-----------------------------------------------------+
// |map_data |
// +-----------------------------------------------------+
// |[age -> 23, name -> some_name, occupation -> teacher]|
// +-----------------------------------------------------+
First we extract the content of the data
field into a string with to_json($"data")
, then we parse and extract the Map with from_json(to_json($"data"), schema)
.
Upvotes: 7
Reputation: 339
Not sure what you mean to convert it to a Map of (String, String), But see if below can help.
val dataDF = spark.read.option("multiline","true").json("madhu/user.json").select("data").toDF
dataDF
.withColumn("age", $"data"("age")).withColumn("city", $"data"("city"))
.withColumn("name", $"data"("name"))
.withColumn("occupation", $"data"("occupation"))
.drop("data")
.show
Upvotes: 1