Reputation: 1034
I have a spark dataframe which has a Json on one of the columns. My task is to turn this dataframe in to a columnar type of dataframe. The problem is that the JSON is dynamic and it always changes structure. What I would like to do is attempt to take values from it and if it and in case it does not have then, return a default value. Is there an option for this in the dataframe? This is how I am taking values out of the JSON, the problem is that in case one of the level changes name or structure, it will not fail.
columnar_df = df.select(col('json')['level1'].alias('json_level1'),
col('json')['level1']['level2a'].alias('json_level1_level2a'),
col('json')['level1']['level2b'].alias('json_levelb'),
)
Upvotes: 0
Views: 268
Reputation: 41
you can do something like that with json_tuple
https://spark.apache.org/docs/2.2.0/api/python/pyspark.sql.html#pyspark.sql.functions.json_tuple
df.select(json_tuple(col("json"), << all_the_fields , _you_want >> ))
Upvotes: 1