Reputation: 3
I wanted to create and save a table that is filled with random int
s. Everything went great so far, but I don't understand how I'm able to get the multi-dimensional array tmp
into a Dataframe
with the schema defined at the top.
import org.apache.spark.sql.types.{
StructType, StructField, StringType, IntegerType, DoubleType}
import org.apache.spark.sql.Row
val schema = StructType(
StructField("rowId", IntegerType, true) ::
StructField("t0_1", DoubleType, true) ::
StructField("t0_2", DoubleType, true) ::
StructField("t0_3", DoubleType, true) ::
StructField("t0_4", DoubleType, true) ::
StructField("t0_5", DoubleType, true) ::
StructField("t0_6", DoubleType, true) ::
StructField("t0_7", DoubleType, true) ::
StructField("t0_8", DoubleType, true) ::
StructField("t0_9", DoubleType, true) ::
StructField("t0_10", DoubleType, true) :: Nil)
val columnNo = 10;
val rowNo = 50;
var c = 0;
var r = 0;
val tmp = Array.ofDim[Double](10,rowNo)
for (r <- 1 to rowNo){
for (c <- 1 to columnNo){
val temp = new scala.util.Random
tmp(c-1)(r-1) = temp.nextDouble
println( "Value of " + c + "/"+ r + ":" + tmp(c-1)(r-1));
}
}
val df = sc.parallelize(tmp).toDF
df.show
dataframe.show
Upvotes: 0
Views: 1435
Reputation: 27373
You cannot transform an Array of Arrays to a DataFrame, rather you need an Array of Tuples ore case classes. Here the variant based on case classes corresponding to the schema you want:
case class Record(
rowID:Option[Int],
t0_1:Option[Double],
t0_2:Option[Double],
t0_3:Option[Double],
t0_4:Option[Double],
t0_5:Option[Double],
t0_6:Option[Double],
t0_7:Option[Double],
t0_8:Option[Double],
t0_9:Option[Double],
t0_10:Option[Double]
)
val rowNo = 50;
val temp = new scala.util.Random
val data = (1 to rowNo).map(r =>
Record(
Some(r),
Some(temp.nextDouble),
Some(temp.nextDouble),
Some(temp.nextDouble),
Some(temp.nextDouble),
Some(temp.nextDouble),
Some(temp.nextDouble),
Some(temp.nextDouble),
Some(temp.nextDouble),
Some(temp.nextDouble),
Some(temp.nextDouble)
)
)
val df = sc.parallelize(data).toDF
Upvotes: 1