Reputation: 63
How to create Spark dataset from pairRDD using java. Could you please help?
Upvotes: -5
Views: 239
Reputation: 10406
Basically, to go from a dataset to a pairRDD in Java, you first need to convert the dataset to a RDD using javaRDD()
and then to a pairRDD
using mapToPair
.
Here is an example:
//creating a dataset (of rows)
Dataset<Row> ds = spark
.range(5)
.select(col("id").alias("x"),
col("id").multiply(col("id")).alias("y"));
JavaPairRDD<Long, Long> pairRDD = ds
.javaRDD() // to RDD in Java
.mapToPair(row -> new Tuple2<>(row.getLong(0), row.getLong(1)));
Upvotes: 0