Dazzler
Dazzler

Reputation: 847

Unable to compare Spark SQL Date columns

I have a case class in scala case class TestDate (id: String, loginTime: java.sql.Date)

I created 2 RDD's of type TestDate

I wanted to do an inner join on two rdd's where the values of loginTime column is equal. Please find the code snippet below,

firstRDD.toDF.registerTempTable("firstTable")
secondRDD.toDF.registerTempTable("secondTable")

val res = sqlContext.sql("select * from firstTable INNER JOIN secondTable on to_date(firstTable.loginTime) = to_date(secondTable.loginTime)")

I'm not getting any exception. But i'm not getting correct answer too. It does a cartesian and some random dates are generated in the result.

Upvotes: 0

Views: 920

Answers (2)

Dazzler
Dazzler

Reputation: 847

The issue was due to a wrong format given while creating the date object. When the format was rectified, it worked fine.

Upvotes: 1

Daniel de Paula
Daniel de Paula

Reputation: 17872

You can try using another approach:

val df1 = firstRDD.toDF
val df2 = secondRDD.toDF

val res = df1.join(df2, Seq("loginTime"))

If it doesn't work, you can try casting your dates to string:

val df1 = firstRDD.toDF.withColumn("loginTimeStr", col("loginTime").cast("string"))
val df2 = secondRDD.toDF.withColumn("loginTimeStr", col("loginTime").cast("string"))

val res = df1.join(df2, Seq("loginTimeStr"))

Finally, maybe the problem is that you also need the ID column in the join?

val df1 = firstRDD.toDF
val df2 = secondRDD.toDF

val res = df1.join(df2, Seq("id", "loginTime"))

Upvotes: 0

Related Questions