Reputation: 8695
I often find myself running a query to get the number of people who meet a certain criteria, the total number of people in that population and the finding the percentage that meets that criteria. I've been doing it for the same way for a while and I was wondering what SO would do to solve the same type of problem. Below is how I wrote the query:
select m.state_cd
,m.injurylevel
,COUNT(distinct m.patid) as pplOnRx
,x.totalPatientsPerState
,round((COUNT(distinct m.patid) /cast(x.totalPatientsPerState as float))*100,2) as percentPrescribedNarcotics
from members as m
inner join rx on rx.patid=m.PATID
inner join DrugTable as dt on dt.drugClass=rx.drugClass
inner join
(
select m2.state_cd, m2.injurylevel, COUNT(distinct m2.patid) as totalPatientsPerState
from members as m2
inner join rx on rx.patid=m2.PATID
group by m2.STATE_CD,m2.injuryLevel
) x on x.state_cd=m.state_cd and m.injuryLevel=x.injurylevel
where drugText like '%narcotics%'
group by m.state_cd,m.injurylevel,x.totalPatientsPerState
order by m.STATE_CD,m.injuryLevel
In this example not everyone who appears in the members
table is in the rx
table. The derived table makes sure that everyone whose in rx
is also in members
without the condition of drugText like narcotics
. From what little I've played with it it seems that the over(partition by
clause might work here. I have no idea if it does, just seems like it to me. How would someone else go about tackling this problem?
results:
Upvotes: 0
Views: 172
Reputation: 591
This is exactly what MDX and SSAS is designed to do. If you insist on doing it in SQL (nothing wrong with that), are you asking for a way to do it with better performance? In that case, it would depend on how the tables are indexed, tempdb speed, and if the tables are partitioned, then that too.
Also, the distinct count is going to be one of larger performance hits. The like '%narcotics%'
in the predicate is going to force a full table scan and should be avoided at all costs (can this be an integer key in the data model?)
To answer your question, not really sure windowing (over partition by
) is going to perform any better. I would test it and see, but there is nothing "wrong" with the query.
You could rewrite the count distinct's as virtual tables or temp tables with group by's or a combination of those two.
To illustrate, this is a stub for windowing that you could grow into the same query:
select a.state_cd,a.injurylevel,a.totalpatid, count(*) over (partition by a.state_cd, a.injurylevel)
from
(select state_cd,injurylevel,count(*) as totalpatid, count(distinct patid) as patid
from
#members
group by state_cd,injurylevel
) a
see what I mean about not really being that helpful? Then again, sometimes rewriting a query slightly can improve performance by selecting a better execution plan, but rather then taking stabs in the dark, I'd first find the bottlenecks in the query you have, since you already took the time to write it.
Upvotes: 2