Reputation: 7
I have a JSON file that looks like this:
{"id":{"0":0,"1":1,"2":2,"3":3}, "name":{"0":"name0","1":"name1","2":"name2","3":"name3"}}
When I read it using PySpark like:
names = spark.read.json('data/names.json')
I get all the rows into a single one, like this:
| id| name|
+--------------+--------------------+
|{0, 1, 2, 3...|{name1, name2, name3...
How can I read it so that the values are on multiple rows?
Upvotes: 0
Views: 863
Reputation: 9308
This is an alternative, more native Spark solution.
First, explode_outer
to explode the id
column and then get the corresponding name
value.
schema = StructType([
StructField('id', MapType(StringType(), IntegerType())),
StructField('name', MapType(StringType(), StringType()))
])
df = spark.read.json('data/names.json', schema=schema)
df = (df.select(F.explode_outer('id').alias('id_k', 'id_v'), 'name')
.withColumn('name', F.col('name').getItem(F.col('id_v'))))
Upvotes: 0