guest1
guest1

Reputation: 46

json conversion within pyspark: Cannot parse the schema in JSON format: Failed to convert the JSON string (big JSON string) to a data type

I'm having trouble with json conversion within pyspark working with complex nested-struct columns. The schema for the from_json doesn't seem to behave. Example:

import pyspark.sql.functions as f

df = spark.createDataFrame([[1,'a'],[2,'b'],[3,'c']], ['rownum','rowchar'])\
.withColumn('struct', f.expr("transform(array(1,2,3), i -> named_struct('a1',rownum*i,'a2',rownum*i*2))"))
df.display()
df.withColumn('struct',f.to_json('struct')).withColumn('struct',f.from_json('struct',df.schema['struct'])).display()
df.withColumn('struct',f.to_json('struct')).withColumn('struct',f.from_json('struct',df.select('struct').schema)).display()

fails with

Cannot parse the schema in JSON format: Failed to convert the JSON string (big JSON string) to a data type

Not sure if this is a syntax error on my end, an edge case that's failing, the wrong way to do things, or something else.

Upvotes: 0

Views: 1495

Answers (1)

blackbishop
blackbishop

Reputation: 32640

You're not passing the correct schema to from_json. Try with this instead:

df.withColumn('struct', f.to_json('struct')) \
  .withColumn('struct', f.from_json('struct', df.schema["struct"].dataType)) \
  .display()

Upvotes: 1

Related Questions