robx
robx

Reputation: 2329

How to detect NULL rows in PostgreSQL sum()

I want to aggregate the sum of a column, while tracking the presence of NULL values which signal an error condition. E.g., take the table numbers:

# select * from numbers;
  n   |  l  
------+-----
    1 | foo
    2 | foo
 NULL | bar
    4 | bar

Given a label l, I want to compute the sum of numbers n with that label, provided there are no NULL values. Ideally, for a label without any rows, the sum would be 0. So I'm looking for some query q such that q('foo') = 3, q('baz') = 0 and q('bar') somehow signals an error, e.g. by returning NULL.

I started with the sum() aggregate function, but that converts NULL rows to 0. One solution would be a variant that returns NULL provided there are any NULL values.

sum() gives

# select sum(n) from numbers where l = 'bar';
 sum 
-----
   4

but I'd rather have sumnull() with

# select sumnull(n) from numbers where l = 'bar';
 sumnull 
---------
  NULL

The best solution I've found so far is to also count non-NULL rows and compare to the total count:

# select sum(n), count(*), count(n) as notnull from numbers;
 sum | count | notnull 
-----+-------+---------
   7 |     4 |       3

Then if count is unequal to notnull, I know the result is not valid.

Upvotes: 12

Views: 18123

Answers (5)

Stephen Chung
Stephen Chung

Reputation: 14605

I have the following... just in case it is useful to somebody:

SELECT SUM(n) * CASE WHEN count(n) = count(*) THEN 1 ELSE NULL END FROM numbers

The trick is to detect whether there are NULL rows by comparing counts, then forcing a math operation with either 1 or NULL which yields a NULL when there are NULL rows.

Upvotes: 0

Maxim
Maxim

Reputation: 2401

1) Use COALESCE

SELECT SUM(COALESCE(n,'NaN'::numeric)) FROM numbers;

If any row is NULL then result will be as 'NaN'

2) You will get NULL as result if any row has NULL value else result as number

SELECT 
    CASE WHEN (SELECT COUNT(*) FROM numbers WHERE n IS NULL) > 0 THEN NULL
        ELSE (SELECT SUM(COALESCE(n, 0)) FROM numbers)
END

Upvotes: 3

Felk
Felk

Reputation: 8234

You can have it return NaN for those cases by using coalesce to replace NULL with NaN;

create table t(
  val int
);

insert into t (val) values (1), (2), (NULL);
select sum(coalesce(val, double precision 'NaN')) from t;

Result: NaN.

insert into t (val) values (1), (2), (3);
select sum(coalesce(val, double precision 'NaN')) from t;

Result: 6.

See this SQLFiddle

Upvotes: 2

klin
klin

Reputation: 121889

You can create a custom aggregate, e.g.:

create or replace function int_sum_null(int, int)
returns int language sql as $$
    select $1 + $2
$$;

create aggregate sumnull(integer) (
    sfunc = int_sum_null,
    stype = int
);

select sum(n), sumnull(n)
from numbers;

 sum | sumnull 
-----+---------
   7 | <null>      
(1 row) 

Update #1

Solutions without a custom aggregate:

select case 
    when bool_or(n is null) then null 
    else sum(n) end
from numbers;

select coalesce((
    select sum(n)
    from numbers
    having not bool_or(n is null)), null);

These variants are based on the Clodoaldo Neto idea. If you like them please upvote his answer too.


Update #2

Modify the custom aggregate sumnull and add the initial condition:

drop aggregate sumnull(integer);
create aggregate sumnull(integer) (
    sfunc = int_sum_null,
    stype = int,
    initcond = 0
);

to get the results you have described in the updated question:

create table numbers (n int, l text);
insert into numbers values 
(1, 'foo'), (2, 'foo'), (null, 'bar'), (4, 'bar');

select 
    sumnull(n) filter (where l = 'foo') foo, 
    sumnull(n) filter (where l = 'bar') bar, 
    sumnull(n) filter (where l = 'baz') baz
from numbers;

 foo | bar | baz 
-----+-----+-----
   3 |     |   0
(1 row)

Upvotes: 8

Clodoaldo Neto
Clodoaldo Neto

Reputation: 125524

Is an empty set good enough?

create table numbers (n int);
insert into numbers values (1),(2),(null),(4);

select sum(n)
from numbers
having bool_and(n is not null);
 sum 
-----
(0 rows)

If you really need a null value it is a bit more complex:

with sum_null as (
    select sum(n) as sum_n
    from numbers
    having bool_and(n is not null)
)
select case
    when not exists (select 1 from sum_null) then null
    else (select sum_n from sum_null) end
;
 sum_n 
-------

(1 row)

Replacing the having line for:

having not bool_or(n is null)

is less readable but potentially much faster since it can stop searching at the first null found.

https://www.postgresql.org/docs/current/static/functions-aggregate.html#FUNCTIONS-AGGREGATE-TABLE

Upvotes: 14

Related Questions