Sparker0i
Sparker0i

Reputation: 1861

Spark Write to Json as Json Array

I have a DataFrame which after a lot of operations looks like:

+--------------------+
|              values|
+--------------------+
|[U5, -1.11115, 1,...|
|[U5, 7458.62418, ...|
|[U5, 171.61934, 1...|
|[U5, 221192.9, 1,...|
|[U5, 1842.27947, ...|
|[U5, 17842.82242,...|
|[U5, 2416.94825, ...|
|[U5, 616.19426, 1...|
|[U5, 1813.14912, ...|
|[U5, 18119.81628,...|
|[U5, 17923.19866,...|
|[U5, 46353.87881,...|
|[U5, 7844.85114, ...|
|[U5, -1.11115, 1,...|
|[U5, -1.11115, 1,...|
|[U5, -1.12131, 1,...|
|[U5, 3981.14464, ...|
|[U5, 439.417, 1, ...|
|[U5, 6966.99999, ...|
+--------------------+

When I write it to a JSON file, it looks like:

{"values":["U5","-1.11115","1","257346.7","1","1","1","-1.11115","343892.72","613295.17","613294.6343","1","1","1","1","1","1","1","1","1","1","1","-1.11115","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","343892.72","1","1","326.3316","343892.72","1","1","1","1","1","1","1","1","1","1","1","1","1","1","-1.11115","257346.7","458949.7","458949.2546","1","1","1","1","1","1","1","1","1","1","1","-1.11115","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","257346.7","1","458949.7","458949.2546","1","1","1","326.3316","257346.7","4812798.18","13454298.34","1","1","1","1","1","1","1","1","1","1","1","326.3316","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","257346.7","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","247","1","373431","668","1","TRAD_SPECTRM","1","0","0","0","0","0","0"]}
{"values":["U5","7458.62418","1","257346.7","1","1","1","7458.62418","343892.72","613295.17","613294.6343","1","1","1","1","1","1","1","1","1","1","1","7458.62418","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","343892.72","1","1","46511.38222","343892.72","1","1","1","1","1","1","1","1","1","1","1","1","1","1","7458.62418","257346.7","458949.7","458949.2546","1","1","1","1","1","1","1","1","1","1","1","7458.62418","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","257346.7","1","458949.7","458949.2546","1","1","1","46511.38222","257346.7","4812798.18","13454298.34","1","1","1","1","1","1","1","1","1","1","1","46511.38222","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","257346.7","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","247","1","373441","668","1","TRAD_SPECTRM","1","0","0","0","0","0","0"]}
{"values":["U5","171.61934","1","257346.7","1","1","1","171.61934","343892.72","613295.17","613294.6343","1","1","1","1","1","1","1","1","1","1","1","171.61934","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","343892.72","1","1","361193.3137","343892.72","1","1","1","1","1","1","1","1","1","1","1","1","1","1","171.61934","257346.7","458949.7","458949.2546","1","1","1","1","1","1","1","1","1","1","1","171.61934","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","257346.7","1","458949.7","458949.2546","1","1","1","361193.3137","257346.7","4812798.18","13454298.34","1","1","1","1","1","1","1","1","1","1","1","361193.3137","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","257346.7","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","247","1","373453","668","1","TRAD_SPECTRM","1","0","0","0","0","0","0"]}
{"values":["U5","221192.9","1","257346.7","1","1","1","221192.9","343892.72","613295.17","613294.6343","1","1","1","1","1","1","1","1","1","1","1","221192.9","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","343892.72","1","1","419152.8592","343892.72","1","1","1","1","1","1","1","1","1","1","1","1","1","1","221192.9","257346.7","458949.7","458949.2546","1","1","1","1","1","1","1","1","1","1","1","221192.9","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","257346.7","1","458949.7","458949.2546","1","1","1","419152.8592","257346.7","4812798.18","13454298.34","1","1","1","1","1","1","1","1","1","1","1","419152.8592","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","257346.7","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","247","1","373461","668","1","TRAD_SPECTRM","1","0","0","0","0","0","0"]}
...

Is there any operation possible in the DataFrame, after which when we write to JSON it will look like:

{
    "values": [
["U5","-1.11115","1","257346.7","1","1","1","-1.11115","343892.72","613295.17","613294.6343","1","1","1","1","1","1","1","1","1","1","1","-1.11115","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","343892.72","1","1","326.3316","343892.72","1","1","1","1","1","1","1","1","1","1","1","1","1","1","-1.11115","257346.7","458949.7","458949.2546","1","1","1","1","1","1","1","1","1","1","1","-1.11115","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","257346.7","1","458949.7","458949.2546","1","1","1","326.3316","257346.7","4812798.18","13454298.34","1","1","1","1","1","1","1","1","1","1","1","326.3316","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","257346.7","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","247","1","373431","668","1","TRAD_SPECTRM","1","0","0","0","0","0","0"],
["U5","7458.62418","1","257346.7","1","1","1","7458.62418","343892.72","613295.17","613294.6343","1","1","1","1","1","1","1","1","1","1","1","7458.62418","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","343892.72","1","1","46511.38222","343892.72","1","1","1","1","1","1","1","1","1","1","1","1","1","1","7458.62418","257346.7","458949.7","458949.2546","1","1","1","1","1","1","1","1","1","1","1","7458.62418","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","257346.7","1","458949.7","458949.2546","1","1","1","46511.38222","257346.7","4812798.18","13454298.34","1","1","1","1","1","1","1","1","1","1","1","46511.38222","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","257346.7","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","247","1","373441","668","1","TRAD_SPECTRM","1","0","0","0","0","0","0"],
["U5","171.61934","1","257346.7","1","1","1","171.61934","343892.72","613295.17","613294.6343","1","1","1","1","1","1","1","1","1","1","1","171.61934","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","343892.72","1","1","361193.3137","343892.72","1","1","1","1","1","1","1","1","1","1","1","1","1","1","171.61934","257346.7","458949.7","458949.2546","1","1","1","1","1","1","1","1","1","1","1","171.61934","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","257346.7","1","458949.7","458949.2546","1","1","1","361193.3137","257346.7","4812798.18","13454298.34","1","1","1","1","1","1","1","1","1","1","1","361193.3137","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","257346.7","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","247","1","373453","668","1","TRAD_SPECTRM","1","0","0","0","0","0","0"]
        ...
    ]
}

Upvotes: 0

Views: 68

Answers (1)

Som
Som

Reputation: 6338

try this-

val df = spark.sql("select values from values array('U5', '-1.11115'), array('U6', '-1.11115') T(values)")
    df.show(false)
    df.printSchema()

    df.agg(collect_list("values").as("values"))
      .write
      .mode(SaveMode.Overwrite)
      .json("/path")

    /**
      * file written-
      * {"values":[["U5","-1.11115"],["U6","-1.11115"]]}
      */

Upvotes: 2

Related Questions