Reputation: 342

Count() not working as expected in SQL

A coworker put together a table for me that I need for mapping data, and table consists of two fields, OP_ID and BillType. In theory, there should be a one-to-one correspondence between the two: every Op_ID should have one and only one bill type.

When I started using it, however, I noticed that there were duplicate OP_IDs with different BillTypes. E.g:

OP_Id               BillType 
007a000v9GWkAAM BillReady 
007a000v9GWkAAM RateReady

So first I built a query to group the data into unique combinations of IDs and BillTypes:

SELECT OP_ID, BillType
FROM MappingTable
GROUP BY OP_ID, BillType

Simple enough. This set should and does include the above two records. Then, I wrapped another query around this to count OP_Ids within the aggregated dataset. Theoretically, any OP_ID with more than one BillType should occur twice, and so it should return a count > 1. Right?

SELECT OP_ID, BillType, Count(OP_ID)
FROM 
    (
    SELECT OP_ID, BillType
    FROM MappingTable
    GROUP BY OP_ID, BillType
    ) Base
GROUP BY  OP_ID, BillType
HAVING Count(OP_ID) > 1

But this query returns nothing whatsoever. Here's what even more baffling: when I remove the HAVING clause and limit the query to pull only the above OP_ID, since I already know it's a dupe, where's what I get:

OP_ID          BillType  CountOfOP_IDs
007a000v9GWkAAM BillReady   1
007a000v9GWkAAM RateReady   1

So there are clearly two records for OP_ID 007a000v9GWkAAM, but SQL is counting only one!

This seems so simple, and I'm sure I'm just missing something basic about how COUNT() works. For reference, I'm working on SQL Server 2014 and both columns are nvarchar. I've also confirmed that SQL evaluates the OP_IDs in both records as identical. Anyone know why this is happening?

Upvotes: 3

Answers (4)

Andreas

Reputation: 159260

The statement

SELECT OP_ID, BillType
FROM MappingTable
GROUP BY OP_ID, BillType

is the long way to say

SELECT DISTINCT OP_ID, BillType
FROM MappingTable

When you now have distinct combinations of OP_ID and BillType, adding a new GROUP BY on those two fields will do nothing.

SELECT ...
FROM ( SELECT DISTINCT OP_ID, BillType
       FROM MappingTable
     ) Base
GROUP BY  OP_ID, BillType
HAVING Count(OP_ID) > 1

The new "groups" will all consist of 1 row from the inner SELECT, so COUNT will always be 1, meaning that Count(OP_ID) > 1 is always false and you will get nothing returned.

Perhaps you meant to find OP_Id values with multiple BillType values. If so, you should drop BillType from the GROUP BY, as suggested by @DonKirkby.

You can get two examples of those BillType values in the result by also retrieving the first and last BillType value, if you are interested in that (often good when researching, at least).

SELECT OP_ID, COUNT(*), MIN(BillType), MAX(BillType)
FROM ( SELECT DISTINCT OP_ID, BillType
       FROM MappingTable
     ) Base
GROUP BY OP_ID
HAVING COUNT(*) > 1

You can shorten the entire statement by using COUNT(DISTINCT ...), also as suggested by @DonKirkby.

SELECT OP_ID, COUNT(DISTINCT BillType), MIN(BillType), MAX(BillType)
FROM MappingTable
GROUP BY OP_ID
HAVING COUNT(DISTINCT BillType) > 1

Upvotes: 3

Steve Barron

Reputation: 944

SELECT * 
FROM MappingTable 
WHERE OP_ID in (SELECT OP_ID 
                FROM (SELECT OP_ID, count(*) ct
                      FROM MappingTable
                      GROUP BY OP_ID)
                WHERE ct > 1)

There are not (at least in your example) duplicates for BOTH columns, just unexpected duplicates for OP_ID.

Upvotes: 1

Gordon Linoff

Reputation: 1271151

Your approach makes sense. My guess is that the OP_ID values are subtly different -- perhaps due to encoding issues or character look-alikes.

This is easy to find. What does this query return?

select mt.*
from mappingtable
where op_id = '007a000v9GWkAAM';

Incidentally, you can simplify your query to:

SELECT OP_ID
FROM MappingTable
GROUP BY OP_ID
HAVING MIN(BillType) <> MAX(BillType);

If you want to see the BillType values then add GROUP_CONCAT(BillType) to the SELECT.

EDIT:

The above is all correct, but your query doesn't work because the outer query is grouping by OP_ID and BILL_TYPE. Try this version:

SELECT OP_ID, Count(OP_ID)
FROM (SELECT OP_ID, BillType
      FROM MappingTable
      GROUP BY OP_ID, BillType
      ) Base
GROUP BY OP_ID
HAVING Count(OP_ID) > 1;

Yours returns no rows because you are grouping on the same keys twice. The subquery removes duplicates, so the count for the outer would always be 1.

Upvotes: 1

Don Kirkby

Reputation: 56230

Count counts the number of rows that have been grouped into one. Just remove bill type from the outer group by.

Also look at the count distinct option. That might be easier.

Upvotes: 4

Count() not working as expected in SQL

Answers (4)

Related Questions