JoinZ
JoinZ

Reputation: 1

SQL - Avoid an additional GROUP BY (and improving query performance)

I am stuck on solving this issue and it should be nice hearing new fresh ideas :)

I have a table with billions of records like this

TAB_IX (int) (PK)
TAB_ID (int) (PK)
PR_ID (int) (PK)
SP_ID (int) (PK)(IX)
....

Previously I was retrieving data like this

SELECT TAB_ID, COUNT (SP_ID) as HITS FROM table t
INNER JOIN table_sp s on t.SP_ID = s.ID
WHERE TAB_IX = @tab_inx 
AND PR_ID IN (SELECT PR_ID FROM @pr_id)
AND s.NAME IN (SELECT DISTINCT NAME FROM @sp_names)  
GROUP BY TAB_ID

table_sp is a little table with 10k of records (ID (int) (PK), NAME (varchar) (IX))

@pr_id and @sp_names are table variables with one column

The query was very fast (about 2-3 sec); now I don't want to distinguish records with different PR_ID and same TAB_IX, TAB_ID, SP_ID

So for example records like

TAB_IX - TAB_ID - PR_ID - SP_ID
1      - 700    - 1     - 100
1      - 700    - 2     - 100

Should be considered as one.

The only way seems doing an additional GROUP BY

Like this

SELECT TAB_ID, COUNT(SP_ID) as HITS FROM (
SELECT TAB_ID, SP_ID, COUNT (PR_ID) FROM table 
WHERE TAB_IX = @tab_inx 
AND PR_ID in (select PR_ID from @pr_id)
AND s.NAME IN (SELECT DISTINCT NAME FROM @sp_names)
GROUP BY TAB_ID, SP_ID) AS DUMMY
GROUP BY TAB_ID

The problem is the performance, because adding this additional GROUP BY operation looks very painful.

Do you have any idea for improving the query?

Thanks in advance :)

Upvotes: 0

Views: 560

Answers (1)

ybo
ybo

Reputation: 17162

I suppose that specifying in the original query that you want to count DISTINCT SP_ID will do the trick

SELECT TAB_ID, COUNT (DISTINCT SP_ID) as HITS FROM table t
INNER JOIN table_sp s on t.SP_ID = s.ID
WHERE TAB_IX = @tab_inx 
AND PR_ID IN (SELECT PR_ID FROM @pr_id)
AND s.NAME IN (SELECT DISTINCT NAME FROM @sp_names)  
GROUP BY TAB_ID

Upvotes: 1

Related Questions