convert data frame to rdd and join

I am trying to compare join performance between dataframe and RDD so I tried to convert dataframe to rdd and then apply join which is failing due to below error.

Error:- console:34: error: not found: value pairRDD1 pairRDD1.join(pairRDD2)

Am I missing something here ?

val df1=sqlContext.sql("select column1,column2,column3,column4 from table_1 AS a")

df1.printSchema()

val pairRdd1=df1.rdd.map(r => (r.getString(0),r.getString(1),r.getString(2),r.getString(3))).map { case (column1,column2,column3,column4) => ((column1),column2,column3,column4)}

val df2=sqlContext.sql("select column1,column2,column3,column4 from table_2 AS b")   

df2.printSchema()

val pairRdd2=df2.rdd.map(r => ((r.getString(0)),r.getString(1),r.getString(2),r.getString(3))).map {case (column1,column2,column3,column4) => ((column1),column2,column3,column4)}

val joined = pairRDD1.join(pairRDD2)



console:34: error: not found: value pairRDD1
pairRDD1.join(pairRDD2)

Thanks

Upvotes: 0

Views: 1308

Answers (1)

koiralo
koiralo

Reputation: 23099

The error says everything console:34: error: not found: value pairRDD1 pairRDD1.join(pairRDD2)

Compiler could not find pairRDD1 because you dont have you just have pairRdd1

You have val joined = pairRDD1.join(pairRDD2)

which should be

val joined = pairRdd1.join(pairRdd2)

You can create RDD[String, (String, String...)] by

  val pairRdd1=df1.rdd.map(r => (r.getString(0),(r.getString(1),r.getString(2),r.getString(3))))

  val pairRdd2=df2.rdd.map(r => (r.getString(0),(r.getString(1),r.getString(2),r.getString(3))))

Upvotes: 1

Related Questions