Reputation: 38
I have below nested JSON with ambiguous column. Basically, the objective is to rename one of the duplicated columns after reading
[
{
"name": "Nish",
"product": "Headphone",
"Delivery": {
"name": "Nisha",
"address": "Chennai",
"mob": "1234567"
}
}
]
val spark = SparkSession.builder.master("local")
.appName("dealWithAmbigousColumnNestedJson").getOrCreate()
val readJson = spark.read.option("multiLine", true).json("input1.json")
val dropDF = readJson.select("*","Delivery.*").drop("Delivery")
I attempted till here but don't know how to proceed further.
Upvotes: 0
Views: 239
Reputation: 45339
You can simply use withColumnRenamed
and change the name of one or both of those columns:
readJson.withColumnRenamed("name", "buyer_name")
.select("*","Delivery.*")
.withColumnRenamed("name", "delivery_name")
.drop("Delivery")
.show()
Which gives:
+----------+---------+-------+-------+-------------+
|buyer_name| product|address| mob|delivery_name|
+----------+---------+-------+-------+-------------+
| Nish|Headphone|Chennai|1234567| Nisha|
+----------+---------+-------+-------+-------------+
Upvotes: 1