Reputation: 1307
Below is the sample row of my data file:
{"externalUserId":"f850bgv8-c638-4ab2-a68a d79375fa2091","externalUserPw":null,"ipaddr":null,"eventId":0,"userId":1713703316,"applicationId":489167,"eventType":201,"eventData":"{\"apps\":[\"com.happyadda.jalebi\"],\"appType\":2}","device":null,"version":"3.0.0-b1","bundleId":null,"appPlatform":null,"eventDate":"2017-01-22T13:46:30+05:30"}`
I have millions of such rows, if entire file is of single json I could use json reader but how could i handle multiple json rows in a single file and convert them to table.
How can i convert this data to sql table with columns:
|externalUserId |externalUserPw|ipaddr| eventId |userId |.......
|---------------|--------------|------|----------|----------|.......
|f850bgv8-..... |null |null |0 |1713703316|.......
Upvotes: 3
Views: 1932
Reputation: 10450
You can use spark built-in read.json
functionality. Which seems to be great for your case, when each line contains one JSON.
As an example, the following creates a DataFrame based on the content of a JSON file:
val df = spark.read.json("examples/src/main/resources/people.json")
// Displays the content of the DataFrame to stdout
df.show()
More info: http://spark.apache.org/docs/2.1.0/sql-programming-guide.html#data-sources
Spark SQL can automatically infer the schema of a JSON dataset and load it as a Dataset[Row]. This conversion can be done using SparkSession.read.json()
on either an RDD of String, or a JSON file.
Note that the file that is offered as a json file is not a typical JSON file. Each line must contain a separate, self-contained valid JSON object. For more information, please see JSON Lines text format, also called newline-delimited JSON. As a consequence, a regular multi-line JSON file will most often fail.
Upvotes: 2