Reputation: 317
I have the following data structure representing columns name (first column) and values for it - something like that:
|col1 |col2 |col3 |columnname |
+-----------+----------------+-----------+-------------+
|Very High |High |Medium |predchurnrisk|
|Active |Lapsed |Renew |userstatus |
|Very High |High |Medium |predinmarket |
|High flyers|Watching Pennies|Big pockets|predsegmentid|
|Male |Female |Others |usergender |
+-----------+----------------+-----------+-------------+
I want the variable domainvalues of type Array[(String, List[String])]
[predchurnrisk,(Very High, High, Medium)]
[userstatus,(Active, Lapsed, Renew)]
.
How this can be done with map or foreach?
Upvotes: 0
Views: 169
Reputation: 4948
As a start :
val df = sc.parallelize(Seq(("Very High","High","Medium","predchurnrisk"),("Active","Lapsed","Renew","userstatus"))).toDF("col1","col2","col3","columnname")
import org.apache.spark.sql.functions._
import spark.implicits._
df.withColumn("arr", array("col1", "col2", "col3")).drop("col1","col2","col3").show
may be you can take it from here , cheers!
Upvotes: 1