Nisha Gupta
Nisha Gupta

Reputation: 38

how to deal with ambiguos column in nested json in Apache Spark

I have below nested JSON with ambiguous column. Basically, the objective is to rename one of the duplicated columns after reading

[
  {
    "name": "Nish",
    "product": "Headphone",
    "Delivery": {
      "name": "Nisha",
      "address": "Chennai",
      "mob": "1234567"
    }
  }
]
val spark = SparkSession.builder.master("local")
    .appName("dealWithAmbigousColumnNestedJson").getOrCreate()

val readJson = spark.read.option("multiLine", true).json("input1.json")

val dropDF = readJson.select("*","Delivery.*").drop("Delivery")

I attempted till here but don't know how to proceed further.

Upvotes: 0

Views: 239

Answers (1)

ernest_k
ernest_k

Reputation: 45339

You can simply use withColumnRenamed and change the name of one or both of those columns:

readJson.withColumnRenamed("name", "buyer_name")
        .select("*","Delivery.*")
        .withColumnRenamed("name", "delivery_name")
        .drop("Delivery")
        .show()

Which gives:

+----------+---------+-------+-------+-------------+
|buyer_name|  product|address|    mob|delivery_name|
+----------+---------+-------+-------+-------------+
|      Nish|Headphone|Chennai|1234567|        Nisha|
+----------+---------+-------+-------+-------------+

Upvotes: 1

Related Questions