Actual column name after flattening Nested JSON using PySpark

Question

I have flatten the nested JSON file now I am facing an ambiguity issue to get the actual column name using PySpark.

Dataframe with the following schema:

Before flattening:

root
 |-- x: string (nullable = true)
 |-- y: string (nullable = true)
 |-- foo: struct (nullable = true)
 |    |-- a: float (nullable = true)
 |    |-- b: float (nullable = true)
 |    |-- c: integer (nullable = true)

After Flattening:

root
 |-- x: string (nullable = true)
 |-- y: string (nullable = true)
 |-- foo_a: float (nullable = true)
 |-- foo_b: float (nullable = true)
 |-- foo_c: integer (nullable = true)

Is it possible to get only the actual name of the column in Data Frame as shown below:

root
 |-- x: string (nullable = true)
 |-- y: string (nullable = true)
 |-- a: float (nullable = true)
 |-- b: float (nullable = true)
 |-- c: integer (nullable = true)

Actual column name after flattening Nested JSON using PySpark

Answers (1)

Related Questions