Reputation: 73
I have a json string as below in a dataframe
aaa | bbb | ccc |ddd | eee
--------------------------------------
100 | xxxx | 123 |yyy|2017
100 | yyyy | 345 |zzz|2017
200 | rrrr | 500 |qqq|2017
300 | uuuu | 200 |ttt|2017
200 | iiii | 500 |ooo|2017
I want to get the result as
{100,[{xxxx:{123,yyy}},{yyyy:{345,zzz}}],2017}
{200,[{rrrr:{500,qqq}},{iiii:{500,ooo}}],2017}
{300,[{uuuu:{200,ttt}}],2017}
Kindly help
Upvotes: 2
Views: 3613
Reputation: 7605
You can create a map defining the values as constants with lit()
or taking them from other columns in the dataframe with $"col_name"
, like this:
val new_df = df.withColumn("map_feature", map(lit("key1"), lit("value1"), lit("key2"), $"col2"))
Upvotes: 0
Reputation: 16076
This works:
val df = data
.withColumn("cd", array('ccc, 'ddd)) // create arrays of c and d
.withColumn("valuesMap", map('bbb, 'cd)) // create mapping
.withColumn("values", collect_list('valuesMap) // collect mappings
.over(Window.partitionBy('aaa)))
.withColumn("eee", first('eee) // e is constant, just get first value of Window
.over(Window.partitionBy('aaa)))
.select("aaa", "values", "eee") // select only columns that are in the question selected
.select(to_json(struct("aaa", "values", "eee")).as("value")) // create JSON
Make sure you do
import org.apache.spark.sql.functions._
import org.apache.spark.sql.expressions._`
Upvotes: 4