Reputation: 1
I am stuck on solving this issue and it should be nice hearing new fresh ideas :)
I have a table with billions of records like this
TAB_IX (int) (PK)
TAB_ID (int) (PK)
PR_ID (int) (PK)
SP_ID (int) (PK)(IX)
....
Previously I was retrieving data like this
SELECT TAB_ID, COUNT (SP_ID) as HITS FROM table t
INNER JOIN table_sp s on t.SP_ID = s.ID
WHERE TAB_IX = @tab_inx
AND PR_ID IN (SELECT PR_ID FROM @pr_id)
AND s.NAME IN (SELECT DISTINCT NAME FROM @sp_names)
GROUP BY TAB_ID
table_sp is a little table with 10k of records (ID (int) (PK), NAME (varchar) (IX))
@pr_id and @sp_names are table variables with one column
The query was very fast (about 2-3 sec); now I don't want to distinguish records with different PR_ID and same TAB_IX, TAB_ID, SP_ID
So for example records like
TAB_IX - TAB_ID - PR_ID - SP_ID
1 - 700 - 1 - 100
1 - 700 - 2 - 100
Should be considered as one.
The only way seems doing an additional GROUP BY
Like this
SELECT TAB_ID, COUNT(SP_ID) as HITS FROM (
SELECT TAB_ID, SP_ID, COUNT (PR_ID) FROM table
WHERE TAB_IX = @tab_inx
AND PR_ID in (select PR_ID from @pr_id)
AND s.NAME IN (SELECT DISTINCT NAME FROM @sp_names)
GROUP BY TAB_ID, SP_ID) AS DUMMY
GROUP BY TAB_ID
The problem is the performance, because adding this additional GROUP BY operation looks very painful.
Do you have any idea for improving the query?
Thanks in advance :)
Upvotes: 0
Views: 560
Reputation: 17162
I suppose that specifying in the original query that you want to count DISTINCT SP_ID
will do the trick
SELECT TAB_ID, COUNT (DISTINCT SP_ID) as HITS FROM table t
INNER JOIN table_sp s on t.SP_ID = s.ID
WHERE TAB_IX = @tab_inx
AND PR_ID IN (SELECT PR_ID FROM @pr_id)
AND s.NAME IN (SELECT DISTINCT NAME FROM @sp_names)
GROUP BY TAB_ID
Upvotes: 1