Reputation: 1999
I have data that looks like this:
State Sex
---- ---
GA M
GA M
GA F
GA F
GA F
NY M
NY M
NY M
NY M
NY F
NY F
NY F
NY F
NY F
I want the result to be:
The query I am using is:
select t.state State,
M.count Male,
F.count Female,
count(t.state) Total,
CONCAT(ROUND(CAST(M.count as float)/CAST(count(t.state) as float)*100, 2), '%') as calc
from MyTable t
join
(
select state, count(sex) as count
from MyTable where sex ='M'
group by state) M
on t.state = M.state
join (
select state, count(sex) as count
from MyTable where sex ='F'
group by state) F
ON M.state = F.state
group by t.state, m.count, F.count;
The above query works but I am wondering if I did this in the most effecent way. This was done using SQLServer but I think this should be the same for all RDBMS. The link is here: http://sqlfiddle.com/#!18/7a969/87
Upvotes: 2
Views: 125
Reputation: 1269503
Use conditional aggregation:
select t.state,
sum(case when sex = 'M' then 1 else 0 end) as males,
sum(case when sex = 'F' then 1 else 0 end) as females,
count(*) as total,
avg(case when sex = 'M' then 1.0 else 0 end) as male_ratio
from MyTable t
group by t.state;
I would expect this to be the fastest method in just about any database.
Here is a SQL Fiddle.
Upvotes: 1
Reputation: 16554
It is not necessary to separate the data out into the tables for Male and Female. From a performance point of view, it might help improve performance if the sub-queries were able to make optimal use of indexes, but in reality the chances are low that you would be agregating over only indexed values.
For this simple query you could use simple CASE expressions inline to express the Male
/Female
columns as BIT
values, then we can SUM those values in a single aggregation, however that would require you to define the CASE for Male
twice, so you could use it in the Male
column and the % Male.
Instead of inline CASE we can use CROSS APPLY as a way to resolve calculations against each row once, and allow you to reference the result:
SELECT t.state State,
SUM(Calcs.IsMale) Male,
SUM(Calcs.IsFemale) Female,
COUNT(1) Total,
CONCAT(ROUND(SUM(Calcs.IsMale)/CAST(COUNT(1) as float)*100, 2), '%') as Calc
FROM MyTable t
CROSS APPLY (SELECT
CASE Sex WHEN 'M' THEN 1 END as [IsMale]
,CASE Sex WHEN 'F' THEN 1 END as [IsFemale]
) as calcs
GROUP BY [State]
Is this any more efficient though? In general it should be, this execution plan is much simpler than joining multiple aggregated sets, but its hard to say without a much larger dataset to test it against.
Either way, I would expect this simple CROSS apply version to win as we only have to process the resultset once.
When running the original and the CROSS APPLY on the given dataset and look at the actual execution plans, SQL Sever reports the CROSS APPLY query to be 25% of the relative cost for the batch:
I apologize in advance for posting this as an image, not really sure if there is a better way to have this discussion
This execution plan reports that the Original Query is 3 times the cost of the CROSS APPLY version, probably due to the 3 table scans in the first query, compared to the single table scan in the CROSS APPLY version.
Upvotes: 0
Reputation: 71451
You can compute the number of females by subtracting the number of males from the total count per state. This way, only a single join
is required:
with r as (select t.state s, count(*) c from testtable t group by t.state)
select r1.s, t1.m males, r1.c - t1.m females, r1.c total, 100*(t1.m/r1.c) m_percent
from r r1
join (select t.state s, t.sex, count(*) m from testtable t group by t.state, t.sex) t1 on r1.s = t1.s where t1.sex = "M";
Output:
state | males | females | total | m_percent |
---|---|---|---|---|
GA | 2 | 3 | 5 | 40.0000 |
NY | 4 | 5 | 9 | 44.4444 |
See demo.
Upvotes: 0