Reputation: 13009
I have a table with the following structure:
name | version | processed | processing | updated | ref_time
------+---------+-----------+------------+----------+----------
abc | 1 | t | f | 27794395 | 27794160
def | 1 | t | f | 27794395 | 27793440
ghi | 1 | t | f | 27794395 | 27793440
jkl | 1 | f | f | 27794395 | 27794160
mno | 1 | t | f | 27794395 | 27793440
pqr | 1 | f | t | 27794395 | 27794160
I can use the following query to count the total number within each ref_time
:
SELECT ref_time, COUNT (*) AS total
FROM (SELECT * FROM status_table) AS _
GROUP BY ref_time;
ref_time | total
----------+-------
27794160 | 2259
27793440 | 2259
And the following query to count the total number within each ref_time
where processed=true
:
SELECT ref_time, COUNT (*) AS processed FROM (SELECT * FROM status_table WHERE processed=true) AS _ GROUP BY ref_time;
ref_time | processed
----------+-----------
27794160 | 1057
27793440 | 2259
I then try to merge the information using an INNER JOIN
on ref_time
:
SELECT * FROM
(SELECT ref_time, COUNT (*) AS total
FROM (SELECT * FROM status_table) AS _
GROUP BY ref_time) result_total
INNER JOIN
(SELECT ref_time, COUNT (*) AS processed
FROM (SELECT * FROM status_table WHERE processed=true) AS _
GROUP BY ref_time) result_processed
ON result_total.ref_time = result_processed.ref_time;
ref_time | total | ref_time | processed
----------+-------+----------+-----------
27794160 | 2259 | 27794160 | 1057
27793440 | 2259 | 27793440 | 2259
First question: how do I avoid the duplicated ref_time
column?
Second question: how do I add an additional percent
column derived as (100 * processed / total)
(to one d.p.), i.e. to give:
ref_time | total | processed | percent
----------+-------+-----------+---------
27794160 | 2259 | 1057 | 46.8
27793440 | 2259 | 2259 | 100.0
Third question: is there a more efficient way to do this? Can I avoid making two separate SELECT
queries?
Upvotes: 1
Views: 156
Reputation: 656962
SELECT ref_time
, count(*) AS total
, count(*) FILTER (WHERE processed) AS processed
, round(count(*) FILTER (WHERE processed) * 100.0 / count(*), 2) AS percent
FROM status_table
GROUP BY 1;
Returning actual, rounded percentage values.
avg(processed::int)
, like the currently accepted answer does, is smart but the additional aggregate function plus involved casting adds cost. (The repeated count(*)
is only computed once.) This is typically faster.
About the aggregate FILTER
clause:
Aside: "name" is not a good name. Reconsider.
Upvotes: 1
Reputation: 222482
Postgres has expressive aggregate functions.
To do the conditional count, we can use the standard filter
clause directly against column processed
, which is a boolean. As for the percentage (or ratio), we can cast the boolean to an integer (which yields 0
or 1
, as you would expect), and take the average of that.
So:
select ref_time,
count(*) cnt_total,
count(*) filter(where processed) cnt_processed,
avg(processed::int) ratio_processed
from mytable
group by ref_time
Here is a demo with your sample data, which returns:
ref_time | cnt_total | cnt_processed | ratio_processed |
---|---|---|---|
27794160 | 3 | 1 | 0.33333333333333333333 |
27793440 | 3 | 3 | 1.00000000000000000000 |
Upvotes: 2
Reputation: 1401
with main as (
select
ref_time,
sum(case when processed = 'true' then 1 else 0 end ) as total_processed,
count(*) as total
from <table_name>
group by 1
)
select *, round((total_processed::numeric / nullif(total::numeric,0)) * 100),2) as percent from main
Upvotes: 1
Reputation: 5217
SELECT ref_time,count(*)as total,
SUM
(
CASE
WHEN processed='t' then 1
else 0
END
)processed
FROM YOUR_TABLE
GROUP BY ref_time
Upvotes: 1