Reputation: 595
I am new to Spark and I want to convert dataframe to pairedRDD. My DataFrame looks like:
tagname,value,Minute
tag1,13.87,5
tag2,32.50,10
tag3,35.00,5
tag1,10.98,2
tag5,11.0,5
I want PairedRDD(tagname, value). I tried
val byKey:Map[String,Long] = winowFiveRDD.map({case (tagname,value) => (tagname)->value})
I am getting the following error:
error: constructor cannot be instantiated to expected type
Help is much appreciated. Thanks in Advance.
Upvotes: 0
Views: 126
Reputation: 35219
I'd use Dataset.as
:
import org.apache.spark.rdd.RDD
val df = Seq(
("tag1", "13.87", "5"), ("tag2", "32.50", "10"), ("tag3", "35.00", "5"),
("tag1", "10.98", "2"), ("tag5", "11.0", "5")
).toDF("tagname", "value", "minute")
val pairedRDD: RDD[(String, Double)] = df
.select($"tagname", $"value".cast("double"))
.as[(String, Double)].rdd
Upvotes: 1
Reputation: 695
From the RDD.SCALA, the map returns the MapPartitionsRDD. You can not cat it to a map directly. Just remove the "Map[String,Long] " is OK.
Upvotes: 0
Reputation: 10082
org.apache.spark.sql.Row
has custom get
methods for certain datatypes.
val df = sc.parallelize(List(
("tag1",13.87,5),
("tag2",32.50,10),
("tag3",35.00,5),
("tag1",10.98,2),
("tag5",11.0,5)
)).toDF("tagname", "value", "minute")
val pairedRDD = df.map(x => (x.getString(0), x.getDouble(1) ) )
pairedRDD.collect
Array[(String, Double)] = Array((tag1,13.87), (tag2,32.5), (tag3,35.0), (tag1,10.98), (tag5,11.0))
You can then call pairedRDD.collect.toMap
to convert it to a Scala Map. You have two keys called tag1
in the question, which is incorrect.
Upvotes: 1