Reputation: 497
I want to convert a Dataframe which contains Double values into a List so that I can use it to make calculations. What is your suggestion so that I can take a correct type List (i.e. Double) ?
My approach is this :
var newList = myDataFrame.collect().toList
but it returns a type List[org.apache.spark.sql.Row] which I don't know what it is exactly!
Is it possible to forget that step and simply pass my Dataframe inside a function and make calculation from it? (For example I want to compare the third element of its second column with a specific double. Is it possible to do so directly from my Dataframe?)
At any cost I have to understand how to create the right type List each time!
EDIT:
Input Dataframe:
+---+---+
|_c1|_c2|
+---+---+
|0 |0 |
|8 |2 |
|9 |1 |
|2 |9 |
|2 |4 |
|4 |6 |
|3 |5 |
|5 |3 |
|5 |9 |
|0 |1 |
|8 |9 |
|1 |0 |
|3 |4 |
|8 |7 |
|4 |9 |
|2 |5 |
|1 |9 |
|3 |6 |
+---+---+
Result after conversion:
List((0,0), (8,2), (9,1), (2,9), (2,4), (4,6), (3,5), (5,3), (5,9), (0,1), (8,9), (1,0), (3,4), (8,7), (4,9), (2,5), (1,9), (3,6))
But every element in the List has to be Double type.
Upvotes: 0
Views: 10336
Reputation: 23109
You can cast the coulmn you need to Double
and convert it to RDD and collect
it
If you have data that cannot be parsed then you can use udf to clean before casting it to double
val stringToDouble = udf((data: String) => {
Try (data.toDouble) match {
case Success(value) => value
case Failure(exception) => Double.NaN
}
})
val df = Seq(
("0.000","0"),
("0.000008","24"),
("9.00000","1"),
("-2","xyz"),
("2adsfas","1.1.1")
).toDF("a", "b")
.withColumn("a", stringToDouble($"a").cast(DoubleType))
.withColumn("b", stringToDouble($"b").cast(DoubleType))
After this you will get output as
+------+----+
|a |b |
+------+----+
|0.0 |0.0 |
|8.0E-6|24.0|
|9.0 |1.0 |
|-2.0 |NaN |
|NaN |NaN |
+------+----+
To get Array[(Double, Double)]
val result = df.rdd.map(row => (row.getDouble(0), row.getDouble(1))).collect()
The result will be Array[(Double, Double)]
Upvotes: 4
Reputation: 193
#Convert DataFrame to DataSet using case class & then convert it to list
#It'll return the list of type of your class object.All the variables inside the #class(mapping to fields in your table)will be pre-typeCasted) Then you won't need to #type cast every time.
#Please execute below code to check it-
#Sample to check & verify(scala)-
val wa = Array("one","two","two")
val wr = sc.parallelize(wa,3).map(x=>(x,"x",1))
val wdf = wr.toDF("a","b","c")
case class wc(a:String,b:String,c:Int)
val myList= wds.collect.toList
myList.foreach(x=>println(x))
myList.foreach(x=>println(x.a.getClass,x.b.getClass,x.c.getClass))
Upvotes: 0
Reputation: 413
myDataFrame.select("_c1", "_c2").collect().map(each => (each.getAs[Double]("_c1"), each.getAs[Double]("_c2"))).toList
Upvotes: -1