Reputation: 27
I am trying to create a dataframe out of json data using pyspark module ,but not able to do,tried doing it with sqlContext.read.json but not getting proper result.
sample json data:
{
"userId":"rirani",
"jobTitleName":"Developer",
"firstName":"Romin",
"lastName":"Irani",
"preferredFullName":"Romin Irani",
"employeeCode":"E1",
"region":"CA",
"phoneNumber":"408-1234567",
"emailAddress":"[email protected]"
},
{
"userId":"nirani",
"jobTitleName":"Developer",
"firstName":"Neil",
"lastName":"Irani",
"preferredFullName":"Neil Irani",
"employeeCode":"E2",
"region":"CA",
"phoneNumber":"408-1111111",
"emailAddress":"[email protected]"
}
{
"userId":"thanks",
"jobTitleName":"Program Directory",
"firstName":"Tom",
"lastName":"Hanks",
"preferredFullName":"Tom Hanks",
"employeeCode":"E3",
"region":"CA",
"phoneNumber":"408-2222222",
"emailAddress":"[email protected]"
}
expected o/p:in table format.can anyone help me this.
Upvotes: 1
Views: 1570
Reputation: 122
You can do use SparkSession:
my_json = [{
"userId":"rirani",
"jobTitleName":"Developer",
"firstName":"Romin",
"lastName":"Irani",
"preferredFullName":"Romin Irani",
"employeeCode":"E1",
"region":"CA",
"phoneNumber":"408-1234567",
"emailAddress":"[email protected]"
},
{ "userId":"nirani",
"jobTitleName":"Developer",
"firstName":"Neil",
"lastName":"Irani",
"preferredFullName":"Neil Irani",
"employeeCode":"E2", "region":"CA",
"phoneNumber":"408-1111111",
"emailAddress":"[email protected]"
},
{ "userId":"thanks",
"jobTitleName":"Program Directory",
"firstName":"Tom",
"lastName":"Hanks",
"preferredFullName":"Tom Hanks", "employeeCode":"E3", "region":"CA", "phoneNumber":"408-2222222",
"emailAddress":"[email protected]"
}]
json_df = spark.read.json(my_json)
json_df.show()
Upvotes: 2