Reputation: 729
I'm trying to create a DataSet from 4 arrays. I have arrays like this :
// Array 1
val rawValues = rawData.select(collect_list("rawValue")).first().getList[Double](0).asScala.toArray
// Array 2
var trendValues = Array[Double]()
// Array 3
var seasonalValues = Array[Double]()
// Array 4
var remainder = Array[Double]()
I have populated last 3 arrays based upon some computations (not included here) on the first Array. All the 4 arrays are of equal size and to populate the first array, another dataset's column-rawValue is converted into an Array as shown above.
After doing all the computations, I want to create a DataSet which is having 4 separate columns and every column is representing above 4 separate arrays.
So, basically how can I create a Dataset from arrays? I'm struggling in doing the same.
Please help.
Upvotes: 0
Views: 2844
Reputation: 473
You just need to club them together in a Sequnce:
case class ArrayMap(rawValues: Double, trendValues: Double, seasonalValues: Double, remainder: Double)
import spark.implicits._
val data = for(i <- arr1.indices) yield ArrayMap(arr1(i), arr2(i) ,arr3(i) ,arr4(i))
data.toDF()
//or else, but takes more steps
arr1.zip(arr2).zip(arr3).zip(arr4)
.map(a => ArrayMap(a._1._1._1, a._1._1._2, a._1._2, a._2))
.toSeq.toDF()
Use zipAll
if Arrays are of different sizes.
EDIT:
I am not sure the use case on how the data is flowing down but if you are trying to create all 4 Arrays from DataFrame, I would suggest you to transform it within DataFrame instead of taking this approach(especially if the Data size is large).
Upvotes: 1