user2784067
user2784067

Reputation: 69

Exclude rows from a SQL query based on number of rows an ID is present in

I was extracting some data earlier and found my results too numerous to be of any use (over 4 million rows returned). I found the culprit to be a large handful of IDs that have several thousand records apiece.

Heres the structure of the data. I have two tables. tbl1 has a single record of every 'ATID' (an ATID being a scale id, basically). Each ATID is in essence its own table, though - it has a record for every item on the scale. tbl2 enumerates those items - it has a row for every scale item per ATID. So it has anywhere from 1 to 500,000 rows for each ATID (I know, pretty f'ing excessive).

I'd like to exclude the ATIDs which have over 100 rows. FYI this is a database I have read-only access to.

Here's my original query:

SELECT tbl1.ATID, tbl1.ATDesc, tbl2.AValue, tbl2.ADesc, tbl2.APosNeg
FROM tbl1 LEFT OUTER JOIN tbl2 ON (tb1.ATID = tbl2.ATID);

Here's the query I used to get a list of ATIDs and record counts in tbl2 per ATID:

select ATID, count(*) as row_count
from tbl2
group by ATID
order by row_count desc;

For my end product I'd like to return only ATIDs with fewer than 100 rows in tbl2. However, if I even try to add a WHERE clause to the second query using the row_count variable, it fails. So I have no idea how to even approach combining the two queries and getting the WHERE clause in.

The only alternative I have is to exclude ATIDs specifically when I know they have too many records, but there are too many to exclude (ie, using WHERE NOT ATID = 1016 AND NOT ATID = 554 AND NOT .... etc)

Upvotes: 2

Views: 233

Answers (4)

Marc Guillot
Marc Guillot

Reputation: 6455

For the second query, the filters related to aggregate functions goes on a HAVING section

select ATID, count(*) as row_count
from tbl2
group by ATID
order by row_count desc;
having count(*) < 100

And for the first query you can simply use a subquery

SELECT tbl1.ATID, tbl1.ATDesc, tbl2.AValue, tbl2.ADesc, tbl2.APosNeg
FROM tbl1 
     LEFT OUTER JOIN tbl2 ON (tb1.ATID = tbl2.ATID)
WHERE tbl1.ATID in (select ATID
                    from tbl2
                    group by ATID
                    having count(*) < 100)

Upvotes: 1

Pரதீப்
Pரதீப்

Reputation: 93724

Here is one way to do it using

Count() Over() window aggregate will count the number of records for each ATID in tbl2 table. Then count the can be used to filter the ATID in Join condition

SELECT tbl1.ATID,
       tbl1.ATDesc,
       tbl2.AValue,
       tbl2.ADesc,
       tbl2.APosNeg
FROM   tbl1
       LEFT OUTER JOIN (SELECT Count(1)OVER(partition BY ATID) AS cnt,*
                        FROM   tbl2) tbl2
                    ON tb1.ATID = tbl2.ATID
                       AND tbl2.cnt <= 100; 

Note : If you dont want all the ATID's from tabl1 then move the tbl2.cnt <= 100 condition to Where clause so you will get the count of ATID's whose count is less than 100 in tbl2

Upvotes: 3

JohnHC
JohnHC

Reputation: 11205

For SQL server, use a CTE (faster than a sub-query). Also, unnecessary brackets in your query

with RowCo as
(
select ATID, count(*) as row_count
from tbl2
group by ATID
having row_count <= 100
)

SELECT tbl1.ATID, tbl1.ATDesc, tbl2.AValue, tbl2.ADesc, tbl2.APosNeg
FROM tbl1 
inner join RowCo
    on RowCo.ATID = tbl1.ATID
LEFT JOIN tbl2 
    ON tb1.ATID = tbl2.ATID

Upvotes: 1

Tab Alleman
Tab Alleman

Reputation: 31785

You've already done it, you just need to put your two queries together:

SELECT tbl1.ATID, tbl1.ATDesc, tbl2.AValue, tbl2.ADesc, tbl2.APosNeg
FROM tbl1 LEFT OUTER JOIN tbl2 ON (tb1.ATID = tbl2.ATID)
WHERE tbl1.ATID NOT IN (
  select ATID
  from tbl2
  group by ATID
  HAVING COUNT(*) >= 100
)

Upvotes: 3

Related Questions