Reputation: 31744
I have a dataframe with two columns. Each column contains json.
cola | colb |
---|---|
{"name":"Adam", "age": 23} | {"country" : "USA"} |
I wish to convert it to:
cola_name | cola_age | colb_country |
---|---|---|
Adam | 23 | USA |
How do I do this?
The approach I have in mind is: In the original dataframe, If I can merge both the json to a single json object. I can then obtain the intended result
spark.read.json(df.select("merged_column").as[String])
But cant find an easy way of merging two json object to single json object in spark
Update: The contents of the json is not known pre-hand. Looking for a way to auto-detect schema
Upvotes: 0
Views: 708
Reputation: 1857
I'm more familiar with pyspark syntax. I think this works:
import pyspark.sql.functions as f
from pyspark.sql.types import *
schema_cola = StructType([
StructField('name', StringType(), True),
StructField('age', IntegerType(), True)
])
schema_colb = StructType([
StructField('country', StringType(), True)
])
df = spark.createDataFrame([('{"name":"Adam", "age": 23}', '{"country" : "USA"}')], ['cola', 'colb'])
display(df
.withColumn('cola_struct', f.from_json(f.col('cola'), schema_cola))
.withColumn('colb_struct', f.from_json(f.col('colb'), schema_colb))
.select(f.col('cola_struct.*'), f.col('colb_struct.*'))
)
Upvotes: 1