msh210
msh210

Reputation: 272

NOT IN yielding no rows, unexpectedly

I have a user-defined function udf that returns a boolean, in MariaDB 5.5.

The following gives me what I expect:

select c, count(*)
from (
    select fld, count(*)c
    from tbl
    where udf(fld)
    group by fld
) t
group by c;

+---+----------+
| c | count(*) |
+---+----------+
| 1 |    12345 |
| 2 |     1234 |
| 3 |      123 |
| 4 |       12 |
| 5 |        1 |
+---+----------+
5 rows in set (26.75 sec)

Likewise, the following gives me the number 12345 (from the table just above), just as I expect:

select anotherfield, count(*)
from tbl
where udf(fld)
and fld in (
    select fld from (
        select fld,count(*)c
        from tbl
        group by fld
        having c=1
    )t
)
group by anotherfield with rollup;

I would expect that the following would also give me 12345:

select anotherfield, count(*)
from tbl
where udf(fld)
and fld not in (
    select fld from (
        select fld,count(*)c
        from tbl
        group by fld
        having c>1
    )t
)
group by anotherfield with rollup;

However, it gives me no rows. Why?

Upvotes: 0

Views: 35

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1270391

As mentioned in a comment, if any of the rows returned by the subquery are NULL, then you get no rows back. One solution is to filter for them explicitly:

where fld not in (select fld 
                  from tbl
                  where fld is not null
                  group by fld
                  having count(*) > 1 
                )

My preferred method is to use not exists because it has the correct semantics:

where not exists (select 1 
                  from tbl t2
                  group by fld
                  having count(*) > 1 and
                         tbl.fld = t2.fld
                 )

That said, a more efficient way is usually to find some difference in the row, rather than checking for a count(*). That is, instead of getting all rows with the same field, it can stop when it gets the second one:

where not exists (select 1 
                  from tbl t2
                  where tbl.fld = t2.fld and
                        tbl.id <> t2.id -- or some appropriate column
                 )

Upvotes: 2

Related Questions