Reputation: 17548
I am stuck trying to write a (presto) SQL query over a table of logs.
Log schema is like this
time(int), column1(boolean)
time
is a integer representing unix timestamp
column1
is a boolean flag
I want to write a query that will give me the ratio of true to false for each day. This seems like a simple concept but I can't wrap my head around how to write this query.
What I have so far:
select
date_format(from_unixtime(time), '%Y-%m-%d') as ds,
column1,
count(*)
from tbl
group by
date_format(from_unixtime(time), '%Y-%m-%d'),
column1
order by ds desc
output looks like this:
2021-09-19, true, 103173
2021-09-19, false, 38713
2021-09-18, true, 87313
2021-09-18, false, 11231
...
How can I get the count of each distinct column1
value in one dimension, grouped by day, so I can calculate the ratio for each day?
I need output like this (the same output as above, except we get a ratio of true:false for each day):
2021-09-19, 72.7
2021-09-18, 88.8
The formula to calculate the ratio would be (true_count / (true_count + false_count)) * 100.0
Upvotes: 0
Views: 110
Reputation: 17548
just extending upon @Gordon Linoff answer since I'm using presto SQL (very slight difference in ternary expression syntax)
select date_format(from_unixtime(time), '%Y-%m-%d') as ds,
avg( case column1 when column1 = 'true' then 1.0 else 0 end) as ratio
from tbl
group by date_format(from_unixtime(time), '%Y-%m-%d'),
order by ds desc;
Upvotes: 0
Reputation: 1269853
If you want the ratio of true
for each day, you can use window functions:
select date_format(from_unixtime(time), '%Y-%m-%d') as ds,
avg( case when column1 = 'true' then 1.0 else 0 end) as ratio
from tbl
group by date_format(from_unixtime(time), '%Y-%m-%d'),
order by ds desc;
Your code looks like MySQL. If so, this can be further simplified:
select date_format(from_unixtime(time), '%Y-%m-%d') as ds,
avg( column1 = 'true' ) as ratio
from tbl
group by date_format(from_unixtime(time), '%Y-%m-%d'),
order by ds desc;
Upvotes: 1