How to access JSON values from PySpark dataframes with default values?

Question

I have a spark dataframe which has a Json on one of the columns. My task is to turn this dataframe in to a columnar type of dataframe. The problem is that the JSON is dynamic and it always changes structure. What I would like to do is attempt to take values from it and if it and in case it does not have then, return a default value. Is there an option for this in the dataframe? This is how I am taking values out of the JSON, the problem is that in case one of the level changes name or structure, it will not fail.

columnar_df = df.select(col('json')['level1'].alias('json_level1'),
col('json')['level1']['level2a'].alias('json_level1_level2a'),
col('json')['level1']['level2b'].alias('json_levelb'),
)

Jason Tapia Diaz · Accepted Answer

you can do something like that with json_tuple

https://spark.apache.org/docs/2.2.0/api/python/pyspark.sql.html#pyspark.sql.functions.json_tuple

df.select(json_tuple(col("json"), << all_the_fields , _you_want >> ))

How to access JSON values from PySpark dataframes with default values?

Answers (1)

Related Questions