Pyspark Schema for Json file

Question

I am trying to read a complex json file into a spark dataframe . Spark recognizes the schema but mistakes a field as string which happens to be an empty array. (Not sure why it is String type when it has to be an array type) Below is a sample that i am expecting

arrayfield:[{"name":"somename"},{"address" : "someadress"}]

Right now the data is as below

arrayfield:[]

what this does to my code is that when ever i try querying arrayfield.name it fails. I know i can input a schema while reading the file but since the json structure is really complex writing it from scratch doesn't really work out. I tried getting the schema using df.schema(which displays in StructType) and modifying it as per my requirement but how do i pass back a string into a StructType ? This might be really silly but i am finding it hard to fix this. Are there any tool / utility which would help me generate the strutType

Pyspark Schema for Json file

Answers (1)

Related Questions