ashish.garg
ashish.garg

Reputation: 291

How to convert DataFrame to Json?

I have a huge JSON file, a small part from it as follows:

{
    "socialNews": [{
        "adminTagIds": "",
        "fileIds": "",
        "departmentTagIds": "",
        ........
        ........
        "comments": [{
            "commentId": "",
            "newsId": "",
            "entityId": "",
            ....
            ....
        }]
    }]
    .....
    }

I have applied lateral view explode on socialNews as follows:

val rdd = sqlContext.jsonFile("file:///home/ashish/test")
rdd.registerTempTable("social")
val result = sqlContext.sql("select * from social LATERAL VIEW explode(socialNews) social AS comment")

Now I want to convert back this result (DataFrame) to JSON and save into a file, but I am not able to find any Scala API to do the conversion. Is there any standard library to do this or some way to figure it out?

Upvotes: 29

Views: 93894

Answers (5)

Ganesh
Ganesh

Reputation: 757

When you run your spark job as
--master local --deploy-mode client
Then,
df.write.json('path/to/file/data.json') works.

If you run on cluster [on header node], [--master yarn --deploy-mode cluster] better approach is to write data to aws s3 or azure blob and read from it.

df.write.json('s3://bucket/path/to/file/data.json') works.

Upvotes: 2

Chetan Tamballa
Chetan Tamballa

Reputation: 43

If you still can't figure out a way to convert Dataframe into JSON, you can use to_json or toJSON inbuilt Spark functions.

Let me know if you have a sample Dataframe and a format of JSON to convert.

Upvotes: -2

abhijitcaps
abhijitcaps

Reputation: 594

sqlContext.read().json(dataFrame.toJSON())

Upvotes: 5

MrChristine
MrChristine

Reputation: 1551

If you have a DataFrame there is an API to convert back to an RDD[String] that contains the json records.

val df = Seq((2012, 8, "Batman", 9.8), (2012, 8, "Hero", 8.7), (2012, 7, "Robot", 5.5), (2011, 7, "Git", 2.0)).toDF("year", "month", "title", "rating")
df.toJSON.saveAsTextFile("/tmp/jsonRecords")
df.toJSON.take(2).foreach(println)

This should be available from Spark 1.4 onward. Call the API on the result DataFrame you created.

The APIs available are listed here

Upvotes: 33

Nikita
Nikita

Reputation: 4515

val result: DataFrame = sqlContext.read.json(path)
result.write.json("/yourPath")

The method write is in the class DataFrameWriter and should be accessible to you on DataFrame objects. Just make sure that your rdd is of type DataFrame and not of deprecated type SchemaRdd. You can explicitly provide type definition val data: DataFrame or cast to dataFrame with toDF().

Upvotes: 35

Related Questions