Converting Nested JSON to DataFrame

Question

I try to converting Nested JSON to DataFrame with the spark.read.option("multiline", "true").json(file_path) but with this code it will order columns name alphabetically, its not what im expected.

When i use the spark.read.option("multiline", "true").json the dataframe will return this columns order: CreatedAt,CreatedBy,IsDeleted,ModifiedAt,ModifiedBy,TypeName, id

The columns order i expected like the order in the .json file: Id,TypeName,CreatedBy,CreatedAt,ModifiedBy,ModifiedAt,IsDeleted

How can i read the nest json with multiline and get the same columns order with the json file. PS: I don't want to manual define the schema ( i want to dynamic the schema)

from pyspark.sql.types import StructType
from pyspark import SparkContext
from pyspark.sql import SQLContext
from pyspark.sql.functions import explode
import json

# Initialize SparkSession
spark = SparkSession.builder \
.appName("Read JSON to DataFrame") \
.getOrCreate()

 # Read JSON file into DataFrame to infer schema
 json_df = spark.read.option("multiline", "true").json(FILE_PATH)

 #Get only 'data' component in json file
 exploded_df = json_df.select(explode("data").alias("data"))


 # Select the fields from the exploded DataFrame
 data_df = exploded_df.select("data.*")
 data_df.show()

json_data

Converting Nested JSON to DataFrame

Answers (1)

Related Questions