Reputation: 339
I am trying to parse a column of a list of json strings but even after trying multiple schemas using structType, structField etc I am just unable to get the schema at all.
[{"event":"empCreation","count":"148"},{"event":"jobAssignment","count":"3"},{"event":"locationAssignment","count":"77"}]
[{"event":"empCreation","count":"334"},{"event":"jobAssignment","count":33"},{"event":"locationAssignment","count":"73"}]
[{"event":"empCreation","count":"18"},{"event":"jobAssignment","count":"32"},{"event":"locationAssignment","count":"72"}]
Based on this SO post, I was able to derive the json schema but even after apply from_json function, it still wouldn't work
Pyspark: Parse a column of json strings
Can you please help?
Upvotes: 0
Views: 145
Reputation: 339
thanks so much @Lakshmanan but I had to just add a slight change to the schema:
eventCountSchema = ArrayType(StructType([StructField("event", StringType(),True),StructField("count", StringType(),True)]), True)
and then applied this schema to the dataframe complex datatype column
Upvotes: 0
Reputation: 1912
You can parse the given json schema with below schame definition and read the json as a DataFrame providing the schema info.
>>> dschema = StructType([
... StructField("event", StringType(),True),
... StructField("count", StringType(),True)])
>>>
>>>
>>> df = spark.read.json('/<json_file_path>/json_file.json', schema=dschema)
>>>
>>> df.show()
+------------------+-----+
| event|count|
+------------------+-----+
| empCreation| 148|
| jobAssignment| 3|
|locationAssignment| 77|
| empCreation| 334|
| jobAssignment| 33|
|locationAssignment| 73|
| empCreation| 18|
| jobAssignment| 32|
|locationAssignment| 72|
+------------------+-----+
>>>
Contents of the json file:
cat json_file.json
[{"event":"empCreation","count":"148"},{"event":"jobAssignment","count":"3"},{"event":"locationAssignment","count":"77"}]
[{"event":"empCreation","count":"334"},{"event":"jobAssignment","count":"33"},{"event":"locationAssignment","count":"73"}]
[{"event":"empCreation","count":"18"},{"event":"jobAssignment","count":"32"},{"event":"locationAssignment","count":"72"}]
Upvotes: 1