Changing the type of a nested JSON attribute

Question

scala> val df = spark.read.json("data.json")

scala> df.printSchema
root
 |-- a: struct (nullable = true)
 |    |-- b: struct (nullable = true)
 |    |    |-- c: long (nullable = true)
 |-- **TimeStamp: string (nullable = true)**
 |-- id: string (nullable = true)


scala> val df1 = df.withColumn("TimeStamp", $"TimeStamp".cast(TimestampType))

scala> df1.printSchema
root
 |-- a: struct (nullable = true)
 |    |-- b: struct (nullable = true)
 |    |    |-- c: long (nullable = true)
 |-- **TimeStamp: timestamp (nullable = true)** // WORKING AS EXPECTED
 |-- id: string (nullable = true)


scala> val df2 = df.withColumn("a.b.c", $"a.b.c".cast(DoubleType))

scala> df2.printSchema
root
 |-- a: struct (nullable = true)
 |    |-- b: struct (nullable = true)
 |    |    |-- c: long (nullable = true)
 |-- TimeStamp: string (nullable = true)
 |-- id: string (nullable = true)
 |-- **a.b.c: double (nullable = true)** // DUPLICATE COLUMN ADDED

I'm trying to change the type of a nested JSON attribute within a data frame column. the change in a nested attribute has been treated as a new column which resulting a duplicate column. the change is working fine for the top level attributes (Timestamp) but not for the nested ones (a.b.c). Any thoughts on this problem ?.

Changing the type of a nested JSON attribute

Answers (1)

Related Questions