Reputation: 325
I have a below sample data,as a dataframe
id|data
1,{"a":["xyz","x"],"b":["abc","ab"]}
2,{"a":["zy","y"],"b":["ac","bc"]}
This is my expected output:
id|details
1,abc
2,ac
This is what I have come with till now,
dataframe.withColumn("details",from_json($"data",StructType(Seq(StructField("b",ArrayType(StringType),true)))))
This is the output I am getting right now
1,[abc,ab]
2,[ac,bc]
Also, this is just a sample, I have more than 100 columns so cannot use select clause. Want to resolve this using withColumn.
Upvotes: 0
Views: 575
Reputation: 7986
Assuming you want to have the first element of the b
array, based on details
column you have already computed, simply select the first element of b
:
dataframe.withColumn("details", expr("details.b[0]"))
You also can use getField
and getItem
functions directly on the result of from_json
dataframe.withColumn("details",from_json($"data",StructType(Seq(StructField("b",ArrayType(StringType),true)))).getField("b").getItem(0))
Upvotes: 2