toofrellik
toofrellik

Reputation: 1307

scala - convert each json row to table

Below is the sample row of my data file:

{"externalUserId":"f850bgv8-c638-4ab2-a68a d79375fa2091","externalUserPw":null,"ipaddr":null,"eventId":0,"userId":1713703316,"applicationId":489167,"eventType":201,"eventData":"{\"apps\":[\"com.happyadda.jalebi\"],\"appType\":2}","device":null,"version":"3.0.0-b1","bundleId":null,"appPlatform":null,"eventDate":"2017-01-22T13:46:30+05:30"}`

I have millions of such rows, if entire file is of single json I could use json reader but how could i handle multiple json rows in a single file and convert them to table.

How can i convert this data to sql table with columns:

 |externalUserId |externalUserPw|ipaddr| eventId  |userId    |.......
 |---------------|--------------|------|----------|----------|.......
 |f850bgv8-..... |null          |null  |0         |1713703316|.......

Upvotes: 3

Views: 1932

Answers (1)

Yaron
Yaron

Reputation: 10450

You can use spark built-in read.json functionality. Which seems to be great for your case, when each line contains one JSON.

As an example, the following creates a DataFrame based on the content of a JSON file:

val df = spark.read.json("examples/src/main/resources/people.json")

// Displays the content of the DataFrame to stdout
df.show()

More info: http://spark.apache.org/docs/2.1.0/sql-programming-guide.html#data-sources

Spark SQL can automatically infer the schema of a JSON dataset and load it as a Dataset[Row]. This conversion can be done using SparkSession.read.json() on either an RDD of String, or a JSON file.

Note that the file that is offered as a json file is not a typical JSON file. Each line must contain a separate, self-contained valid JSON object. For more information, please see JSON Lines text format, also called newline-delimited JSON. As a consequence, a regular multi-line JSON file will most often fail.

Upvotes: 2

Related Questions