vap0991
vap0991

Reputation: 113

Converting a dataframe columns into nested JSON structure using pyspark

I'm fairly new to pyspark

My code what I have tried so far

ConceptGriddf = spark.sql("""
          SELECT  DataID
                 ,collect_list(struct(ConceptGridName AS Title,
                                      named_struct("ZoneName",ZoneName
                                                   ,"Zone", Zone
                                                   ,"ZoneScore", ZoneScore)as Zones
                                                   ))ConceptGrid
          FROM OutputTable 
          GROUP BY DataID """)

if dbutils.widgets.get("Debug") == 'yes':
  display(ConceptGriddf.limit(10))

The table

DataID ConceptGridName Zone ZoneName ZoneScore
1       Reserved_89    y    Shared    0.58115    
1       Reserved_89    x    Unshared  0.4939 

Here's my current JSON output in the dataframe

"ConceptGrid": [
    {
        "Title": "Reserved_89",
        "Zones": {
            "ZoneName": "Shared",
            "Zone": "y",
            "ZoneScore": 0.58115
        }
    },
    {
        "Title": "Reserved_89",
        "Zones": {
            "ZoneName": "Unshared",
            "Zone": "x",
            "ZoneScore": 0.4939
        }
    }
  ]

Instead I want it to look something like this and I'm not able to achieve with the functions I'm using here

"ConceptGrid": [
      {
          "Title":"Reserved_89",
          "Zones":[
              {
              "ZoneName":"Shared",
              "Zone":"y",
              "ZoneScore":0.58115
              },
              {
                "ZoneName":"Unshared",
                "Zone":"x",
                "ZoneScore":0.4939
                }
            
            ]
        }
    ]

Upvotes: 0

Views: 77

Answers (1)

s.polam
s.polam

Reputation: 10372

Check below code.

 spark.sql("with data as (select ConceptGridName as Title,collect_list(struct(ZoneName,Zone,ZoneScore)) as zones from OutputTable group by ConceptGridName) select collect_list(struct(Title,zones)) as ConceptGrid from data").toJSON.show(false)
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
|value                                                                                                                                                         |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
|{"ConceptGrid":[{"Title":"Reserved_89","zones":[{"ZoneName":"Shared","Zone":"y","ZoneScore":0.58115},{"ZoneName":"Unshared","Zone":"x","ZoneScore":0.4939}]}]}|
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+

Upvotes: 1

Related Questions