Ajay Patil
Ajay Patil

Reputation: 159

Fetch the partial value from a column having key value pairs and assign it to new column in Spark Dataframe

I have a data frame as below

+----+-----------------------------+
|id  | att                         |
+----+-----------------------------+
| 25 | {"State":"abc","City":"xyz"}|
| 26 | null                        |
| 27 | {"State":"pqr"}             |
+----+-----------------------------+

I want a dataframe with columns id and city if the att column has city attribute else null

+----+------+
|id  | City | 
+----+------+
| 25 | xyz  |
| 26 | null |
| 27 | null |
+----+------+

Language : Scala

Upvotes: 1

Views: 53

Answers (1)

abiratsis
abiratsis

Reputation: 7336

You can use from_json to parse and convert your json data to Map. Then access the map item using one of:

  • getItem method of the Column class
  • default accessor, i.e map("map_key")
  • element_at function
import org.apache.spark.sql.functions.from_json
import org.apache.spark.sql.types.{MapType, StringType}

import sparkSession.implicits._
val df = Seq(
  (25, """{"State":"abc","City":"xyz"}"""),
  (26, null),
  (27, """{"State":"pqr"}""")
).toDF("id", "att")

val schema = MapType(StringType, StringType)

df.select($"id", from_json($"att", schema).getItem("City").as("City"))

//or df.select($"id", from_json($"att", schema)("City").as("City"))
//or df.select($"id", element_at(from_json($"att", schema), "City").as("City"))

// +---+----+
// | id|City|
// +---+----+
// | 25| xyz|
// | 26|null|
// | 27|null|
// +---+----+

Upvotes: 2

Related Questions