Derived Tables for summary statistics

Question

I often find myself running a query to get the number of people who meet a certain criteria, the total number of people in that population and the finding the percentage that meets that criteria. I've been doing it for the same way for a while and I was wondering what SO would do to solve the same type of problem. Below is how I wrote the query:

select m.state_cd
    ,m.injurylevel
    ,COUNT(distinct m.patid) as pplOnRx
    ,x.totalPatientsPerState
    ,round((COUNT(distinct m.patid) /cast(x.totalPatientsPerState as float))*100,2) as percentPrescribedNarcotics
    from members as m
    inner join rx on rx.patid=m.PATID
    inner join DrugTable as dt on dt.drugClass=rx.drugClass
    inner join 
    (
        select m2.state_cd, m2.injurylevel, COUNT(distinct m2.patid) as totalPatientsPerState
            from members as m2
            inner join rx on rx.patid=m2.PATID
            group by m2.STATE_CD,m2.injuryLevel
    ) x on x.state_cd=m.state_cd and m.injuryLevel=x.injurylevel
    where drugText like '%narcotics%'
    group by m.state_cd,m.injurylevel,x.totalPatientsPerState
    order by m.STATE_CD,m.injuryLevel

In this example not everyone who appears in the members table is in the rx table. The derived table makes sure that everyone whose in rx is also in members without the condition of drugText like narcotics. From what little I've played with it it seems that the over(partition by clause might work here. I have no idea if it does, just seems like it to me. How would someone else go about tackling this problem?

results:

enter image description here

mrkb80 · Accepted Answer

This is exactly what MDX and SSAS is designed to do. If you insist on doing it in SQL (nothing wrong with that), are you asking for a way to do it with better performance? In that case, it would depend on how the tables are indexed, tempdb speed, and if the tables are partitioned, then that too.

Also, the distinct count is going to be one of larger performance hits. The like '%narcotics%' in the predicate is going to force a full table scan and should be avoided at all costs (can this be an integer key in the data model?)

To answer your question, not really sure windowing (over partition by) is going to perform any better. I would test it and see, but there is nothing "wrong" with the query.

You could rewrite the count distinct's as virtual tables or temp tables with group by's or a combination of those two.

To illustrate, this is a stub for windowing that you could grow into the same query:

select a.state_cd,a.injurylevel,a.totalpatid, count(*) over (partition by a.state_cd, a.injurylevel)
from
(select state_cd,injurylevel,count(*) as totalpatid, count(distinct patid) as patid 
from
#members
group by state_cd,injurylevel
)  a

see what I mean about not really being that helpful? Then again, sometimes rewriting a query slightly can improve performance by selecting a better execution plan, but rather then taking stabs in the dark, I'd first find the bottlenecks in the query you have, since you already took the time to write it.

Derived Tables for summary statistics

Answers (1)

Related Questions