inbal_bpr
inbal_bpr

Reputation: 73

Count distinct by multiple columns takes a long time

I'm trying to perform count distinct in SQL server, by two fields.

I've tried two different ways. the first one is concatenation:

select count (distinct CONCAT ([UCID],[CALLSEGMENT]))
 from IVR_LINES
where UCID in (select UCID
from [EPMTest].[dbo].[IVR_LINES]
where MODULE = 'hozlap' and EVENT_NAME = 'A3' and EVENT_VALUE in ('1','2','3') and DATE> 20160911)
and EVENT_NAME = 'A6' or  EVENT_NAME = 'A7'

While the second one is by using a sub-query:

select COUNT(*) from(
select distinct UCID,CALLSEGMENT
 from IVR_LINES
where UCID in (select UCID
from [EPMTest].[dbo].[IVR_LINES]
where MODULE = 'hozlap' and EVENT_NAME = 'A3' and EVENT_VALUE in ('1','2','3') and DATE> 20160911)
and EVENT_NAME = 'A6' or  EVENT_NAME = 'A7'
)a

They take about the same running time (more than 10 seconds, probably due to the inner query itself). However, I would like to know - which one is more efficient? which method should I choose?

Thanks

Upvotes: 0

Views: 96

Answers (2)

MtwStark
MtwStark

Reputation: 4048

I think something is wrong with your queries

first DATE > 20160911 should give an error, try to replace it with DATE > '20160911'

second and EVENT_NAME = 'A6' or EVENT_NAME = 'A7'
is VERY different from and (EVENT_NAME = 'A6' or EVENT_NAME = 'A7')

with your syntax the queries will count many more records than (I think) expected and this could be the problem of slow execution time

talking about which is the best... they are almost the same for the optimizer

you should add some info about primary key, indexes, number of records and some example of data.

Upvotes: 0

StackUser
StackUser

Reputation: 5398

Please post your execution plan to find where exactly it takes time. Meanwhile you can rewrite your query with exists clause like this.

SELECT COUNT(*)
FROM (
    SELECT DISTINCT UCID
        ,CALLSEGMENT
    FROM IVR_LINES IL
    WHERE EXISTS (
            SELECT 1
            FROM [EPMTest].[dbo].[IVR_LINES]
            WHERE MODULE = 'hozlap'
                AND EVENT_NAME = 'A3'
                AND EVENT_VALUE IN (
                    '1'
                    ,'2'
                    ,'3'
                    )
                AND UCID = IL.UCID
                AND DATE > 20160911
            )
        AND EVENT_NAME IN (
            'A6'
            ,'A7'
            )
    ) A

Upvotes: 1

Related Questions