how to deal with ambiguos column in nested json in Apache Spark

Question

I have below nested JSON with ambiguous column. Basically, the objective is to rename one of the duplicated columns after reading

[
  {
    "name": "Nish",
    "product": "Headphone",
    "Delivery": {
      "name": "Nisha",
      "address": "Chennai",
      "mob": "1234567"
    }
  }
]

val spark = SparkSession.builder.master("local")
    .appName("dealWithAmbigousColumnNestedJson").getOrCreate()

val readJson = spark.read.option("multiLine", true).json("input1.json")

val dropDF = readJson.select("*","Delivery.*").drop("Delivery")

I attempted till here but don't know how to proceed further.

ernest_k · Accepted Answer

You can simply use withColumnRenamed and change the name of one or both of those columns:

readJson.withColumnRenamed("name", "buyer_name")
        .select("*","Delivery.*")
        .withColumnRenamed("name", "delivery_name")
        .drop("Delivery")
        .show()

Which gives:

+----------+---------+-------+-------+-------------+
|buyer_name|  product|address|    mob|delivery_name|
+----------+---------+-------+-------+-------------+
|      Nish|Headphone|Chennai|1234567|        Nisha|
+----------+---------+-------+-------+-------------+

how to deal with ambiguos column in nested json in Apache Spark

Answers (1)

Related Questions