Count distinct by multiple columns takes a long time

Question

I'm trying to perform count distinct in SQL server, by two fields.

I've tried two different ways. the first one is concatenation:

select count (distinct CONCAT ([UCID],[CALLSEGMENT]))
 from IVR_LINES
where UCID in (select UCID
from [EPMTest].[dbo].[IVR_LINES]
where MODULE = 'hozlap' and EVENT_NAME = 'A3' and EVENT_VALUE in ('1','2','3') and DATE> 20160911)
and EVENT_NAME = 'A6' or  EVENT_NAME = 'A7'

While the second one is by using a sub-query:

select COUNT(*) from(
select distinct UCID,CALLSEGMENT
 from IVR_LINES
where UCID in (select UCID
from [EPMTest].[dbo].[IVR_LINES]
where MODULE = 'hozlap' and EVENT_NAME = 'A3' and EVENT_VALUE in ('1','2','3') and DATE> 20160911)
and EVENT_NAME = 'A6' or  EVENT_NAME = 'A7'
)a

They take about the same running time (more than 10 seconds, probably due to the inner query itself). However, I would like to know - which one is more efficient? which method should I choose?

Thanks

MtwStark · Accepted Answer

I think something is wrong with your queries

first DATE > 20160911 should give an error, try to replace it with DATE > '20160911'

second and EVENT_NAME = 'A6' or EVENT_NAME = 'A7'
is VERY different from and (EVENT_NAME = 'A6' or EVENT_NAME = 'A7')

with your syntax the queries will count many more records than (I think) expected and this could be the problem of slow execution time

talking about which is the best... they are almost the same for the optimizer

you should add some info about primary key, indexes, number of records and some example of data.

Count distinct by multiple columns takes a long time

Answers (2)

Related Questions