Reputation: 73
I'm trying to perform count distinct in SQL server, by two fields.
I've tried two different ways. the first one is concatenation:
select count (distinct CONCAT ([UCID],[CALLSEGMENT]))
from IVR_LINES
where UCID in (select UCID
from [EPMTest].[dbo].[IVR_LINES]
where MODULE = 'hozlap' and EVENT_NAME = 'A3' and EVENT_VALUE in ('1','2','3') and DATE> 20160911)
and EVENT_NAME = 'A6' or EVENT_NAME = 'A7'
While the second one is by using a sub-query:
select COUNT(*) from(
select distinct UCID,CALLSEGMENT
from IVR_LINES
where UCID in (select UCID
from [EPMTest].[dbo].[IVR_LINES]
where MODULE = 'hozlap' and EVENT_NAME = 'A3' and EVENT_VALUE in ('1','2','3') and DATE> 20160911)
and EVENT_NAME = 'A6' or EVENT_NAME = 'A7'
)a
They take about the same running time (more than 10 seconds, probably due to the inner query itself). However, I would like to know - which one is more efficient? which method should I choose?
Thanks
Upvotes: 0
Views: 96
Reputation: 4048
I think something is wrong with your queries
first DATE > 20160911
should give an error,
try to replace it with DATE > '20160911'
second and EVENT_NAME = 'A6' or EVENT_NAME = 'A7'
is VERY different from and (EVENT_NAME = 'A6' or EVENT_NAME = 'A7')
with your syntax the queries will count many more records than (I think) expected and this could be the problem of slow execution time
talking about which is the best... they are almost the same for the optimizer
you should add some info about primary key, indexes, number of records and some example of data.
Upvotes: 0
Reputation: 5398
Please post your execution plan to find where exactly it takes time. Meanwhile you can rewrite your query with exists clause like this.
SELECT COUNT(*)
FROM (
SELECT DISTINCT UCID
,CALLSEGMENT
FROM IVR_LINES IL
WHERE EXISTS (
SELECT 1
FROM [EPMTest].[dbo].[IVR_LINES]
WHERE MODULE = 'hozlap'
AND EVENT_NAME = 'A3'
AND EVENT_VALUE IN (
'1'
,'2'
,'3'
)
AND UCID = IL.UCID
AND DATE > 20160911
)
AND EVENT_NAME IN (
'A6'
,'A7'
)
) A
Upvotes: 1