hiits100rav
hiits100rav

Reputation: 73

Spark SQL: NULL handling in IF

I am trying to perform IF [spark's coalesce] on top of a left outer joined output, but seems that NULL is not getting handled as expected. Here are my base tables, sample query, output and expected output-

Base tables:

t1:
a,100
b,101
c,102

t2:
101

Query:

select a.x, a.x1, IF(b.x1 is NULL,a.x1,b.x1) from t1 a LEFT OUTER JOIN t2 b on a.x1=b.x1;

Output:

a,100,null
b,101,101
c,102,null

Expected:

a,100,100
b,101,101
c,102,102

I have also tried wrapping the above query and then performing an IF on top of it. But with no success. Please suggest is I am missing something.

Upvotes: 3

Views: 17023

Answers (2)

user3279189
user3279189

Reputation: 1653

This seems to be working

File: tbl1

1   a
2   b
3   c

File: tbl2

1   c
3   d

case class c_tbl1(c1: String,c2: String)

sc.textFile("tbl1").map { row => 
val parts = row.split("\t")
c_tbl1(parts(0),parts(1)) }.registerTempTable("t_tbl1")

case class c_tbl2(c1: String,c2: String)

sc.textFile("tbl2").map { row => 
val parts = row.split("\t")
c_tbl2(parts(0),parts(1)) }.registerTempTable("t_tbl2")

sqlContext.sql("""select t.c1,t.c2,IF(t2.c1 is null,1,2),t2.c2 from t_tbl1 t left outer join t_tbl2 t2 on t.c1=t2.c1""".stripMargin).collect.foreach(println)


[1,a,2,c]
[2,b,1,null]
[3,c,2,d]

Upvotes: 1

HaveNoDisplayName
HaveNoDisplayName

Reputation: 8497

Try Case statement, not sure this CASE statement is supported by Spark SQL:-

select a.x, a.x1,
      CASE WHEN b.x1 IS NULL THEN a.x1
           ELSE b.x1
      END as bx1
from t1 a LEFT OUTER JOIN t2 b on a.x1=b.x1;

Upvotes: 1

Related Questions