sri
sri

Reputation: 161

Handling the NULL data in impala

I am trying understand the behavior of arithmetic operation in impala

i am using the following table

╔════╦══════════════╦══════════╗
║ id ║  name        ║ salary   ║
╠════╬══════════════╬══════════╣
║  1 ║  cde         ║ 5636     ║
║  2 ║  asd         ║  148     ║
║  3 ║  pwe         ║  null    ║
║  4 ║  lmn         ║  959     ║
╚════╩══════════════╩══════════╝

when i execute the following query

select salary+20 as sum1 from table where id=3;

it returns me

     |sum1
  ---|-----
   1 |NULL

when i run sum on the column with

select sum(salary) as sum1 from table;

     |sum1
  ---|-----
   1 |6743

I am unable to understand how same arithmetic operations are behaving differently

Upvotes: 1

Views: 21516

Answers (3)

boethius
boethius

Reputation: 438

Have you tried something like:

select sum(COALESCE(salary,0)) as sum1 from table;

This should ensure that the value returned is a number. Similar to leftjoin's answer. You could also do this with case statements or where statements.

Source:

https://www.cloudera.com/documentation/enterprise/5-4-x/topics/impala_conditional_functions.html

Upvotes: 3

leftjoin
leftjoin

Reputation: 38335

NULL is not a zero (0) and zero is not a NULL, it's a value, and this is most important. NULL is absence of the value, nothing. sum() aggregates values only. If it were no any values in the dataset then it returns NULL. If it were some values, sum() will aggregate them, ignoring nulls. If you want to convert NULL it to zero, use NVL() function:

select nvl(sum(salary),0) as sum1 from table where ...

This query will return 0 in case there are no values in the dataset.

Upvotes: 1

Francisco
Francisco

Reputation: 2228

These are different arithmetic queries.

In your first query you want the DB to return salary+20 from row 3. Well the salary of row 3 is NULL. So the DB will see that NULL+20=NULL and return NULL.

However in the second query: sum(salary) from table; is asking the DB to sum the entire salary column and return it to you. So it looks at it and does 5636+148+959=6743 (ignores NULL).

In conclusion in the first query you are executing a basic arithmetic operator. But the second query sum() is an operator that is applied to returned rows. As a better test, try running select sum(salary) as sum1 from table where id=3 and see what you get to better understand what is happening.

Upvotes: 0

Related Questions