Reputation: 847
I have a case class in scala case class TestDate (id: String, loginTime: java.sql.Date)
I created 2 RDD's of type TestDate
I wanted to do an inner join on two rdd's where the values of loginTime column is equal. Please find the code snippet below,
firstRDD.toDF.registerTempTable("firstTable")
secondRDD.toDF.registerTempTable("secondTable")
val res = sqlContext.sql("select * from firstTable INNER JOIN secondTable on to_date(firstTable.loginTime) = to_date(secondTable.loginTime)")
I'm not getting any exception. But i'm not getting correct answer too. It does a cartesian and some random dates are generated in the result.
Upvotes: 0
Views: 920
Reputation: 847
The issue was due to a wrong format given while creating the date object. When the format was rectified, it worked fine.
Upvotes: 1
Reputation: 17872
You can try using another approach:
val df1 = firstRDD.toDF
val df2 = secondRDD.toDF
val res = df1.join(df2, Seq("loginTime"))
If it doesn't work, you can try casting your dates to string:
val df1 = firstRDD.toDF.withColumn("loginTimeStr", col("loginTime").cast("string"))
val df2 = secondRDD.toDF.withColumn("loginTimeStr", col("loginTime").cast("string"))
val res = df1.join(df2, Seq("loginTimeStr"))
Finally, maybe the problem is that you also need the ID column in the join?
val df1 = firstRDD.toDF
val df2 = secondRDD.toDF
val res = df1.join(df2, Seq("id", "loginTime"))
Upvotes: 0