Reputation: 113
I'm fairly new to pyspark
My code what I have tried so far
ConceptGriddf = spark.sql("""
SELECT DataID
,collect_list(struct(ConceptGridName AS Title,
named_struct("ZoneName",ZoneName
,"Zone", Zone
,"ZoneScore", ZoneScore)as Zones
))ConceptGrid
FROM OutputTable
GROUP BY DataID """)
if dbutils.widgets.get("Debug") == 'yes':
display(ConceptGriddf.limit(10))
The table
DataID ConceptGridName Zone ZoneName ZoneScore
1 Reserved_89 y Shared 0.58115
1 Reserved_89 x Unshared 0.4939
Here's my current JSON output in the dataframe
"ConceptGrid": [
{
"Title": "Reserved_89",
"Zones": {
"ZoneName": "Shared",
"Zone": "y",
"ZoneScore": 0.58115
}
},
{
"Title": "Reserved_89",
"Zones": {
"ZoneName": "Unshared",
"Zone": "x",
"ZoneScore": 0.4939
}
}
]
Instead I want it to look something like this and I'm not able to achieve with the functions I'm using here
"ConceptGrid": [
{
"Title":"Reserved_89",
"Zones":[
{
"ZoneName":"Shared",
"Zone":"y",
"ZoneScore":0.58115
},
{
"ZoneName":"Unshared",
"Zone":"x",
"ZoneScore":0.4939
}
]
}
]
Upvotes: 0
Views: 77
Reputation: 10372
Check below code.
spark.sql("with data as (select ConceptGridName as Title,collect_list(struct(ZoneName,Zone,ZoneScore)) as zones from OutputTable group by ConceptGridName) select collect_list(struct(Title,zones)) as ConceptGrid from data").toJSON.show(false)
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
|value |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
|{"ConceptGrid":[{"Title":"Reserved_89","zones":[{"ZoneName":"Shared","Zone":"y","ZoneScore":0.58115},{"ZoneName":"Unshared","Zone":"x","ZoneScore":0.4939}]}]}|
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
Upvotes: 1