SWDeveloper
SWDeveloper

Reputation: 339

parse string of jsons pyspark

I am trying to parse a column of a list of json strings but even after trying multiple schemas using structType, structField etc I am just unable to get the schema at all.

[{"event":"empCreation","count":"148"},{"event":"jobAssignment","count":"3"},{"event":"locationAssignment","count":"77"}]

[{"event":"empCreation","count":"334"},{"event":"jobAssignment","count":33"},{"event":"locationAssignment","count":"73"}]

[{"event":"empCreation","count":"18"},{"event":"jobAssignment","count":"32"},{"event":"locationAssignment","count":"72"}]

Based on this SO post, I was able to derive the json schema but even after apply from_json function, it still wouldn't work

Pyspark: Parse a column of json strings

Can you please help?

Upvotes: 0

Views: 145

Answers (2)

SWDeveloper
SWDeveloper

Reputation: 339

thanks so much @Lakshmanan but I had to just add a slight change to the schema:

eventCountSchema = ArrayType(StructType([StructField("event", StringType(),True),StructField("count", StringType(),True)]), True)

and then applied this schema to the dataframe complex datatype column

Upvotes: 0

Lakshman Battini
Lakshman Battini

Reputation: 1912

You can parse the given json schema with below schame definition and read the json as a DataFrame providing the schema info.

>>> dschema = StructType([
...         StructField("event", StringType(),True),
...         StructField("count", StringType(),True)])
>>>

>>>
>>> df = spark.read.json('/<json_file_path>/json_file.json', schema=dschema)
>>>
>>> df.show()
+------------------+-----+
|             event|count|
+------------------+-----+
|       empCreation|  148|
|     jobAssignment|    3|
|locationAssignment|   77|
|       empCreation|  334|
|     jobAssignment|   33|
|locationAssignment|   73|
|       empCreation|   18|
|     jobAssignment|   32|
|locationAssignment|   72|
+------------------+-----+

>>>

Contents of the json file:

cat json_file.json
[{"event":"empCreation","count":"148"},{"event":"jobAssignment","count":"3"},{"event":"locationAssignment","count":"77"}]
[{"event":"empCreation","count":"334"},{"event":"jobAssignment","count":"33"},{"event":"locationAssignment","count":"73"}]
[{"event":"empCreation","count":"18"},{"event":"jobAssignment","count":"32"},{"event":"locationAssignment","count":"72"}]

Upvotes: 1

Related Questions