How to read JSON file with correct format in PySpark?

Question

I have a JSON file that looks like this:

{"id":{"0":0,"1":1,"2":2,"3":3}, "name":{"0":"name0","1":"name1","2":"name2","3":"name3"}}

When I read it using PySpark like:

names = spark.read.json('data/names.json')

I get all the rows into a single one, like this:


|            id|                name|
+--------------+--------------------+
|{0, 1, 2, 3...|{name1, name2, name3...

How can I read it so that the values are on multiple rows?

radi · Accepted Answer

A quick hack can be to read the json with pandas like this:pandas_df = pandas.read_json('data/names.json') and then load it in spark spark_df = spark.createDataFrame(pandas_df). For more comprehensive analysis of the problem check this.

How to read JSON file with correct format in PySpark?

Answers (2)

Related Questions