James Wierzba
James Wierzba

Reputation: 17548

How to get counts of distinct values for a column binned by time in SQL?

I am stuck trying to write a (presto) SQL query over a table of logs.

Log schema is like this

time(int), column1(boolean)

time is a integer representing unix timestamp

column1 is a boolean flag

I want to write a query that will give me the ratio of true to false for each day. This seems like a simple concept but I can't wrap my head around how to write this query.

What I have so far:

select 
    date_format(from_unixtime(time), '%Y-%m-%d') as ds,
    column1,
    count(*)
from tbl
group by 
    date_format(from_unixtime(time), '%Y-%m-%d'), 
    column1
order by ds desc

output looks like this:

2021-09-19, true, 103173
2021-09-19, false, 38713
2021-09-18, true, 87313
2021-09-18, false, 11231
...

How can I get the count of each distinct column1 value in one dimension, grouped by day, so I can calculate the ratio for each day?

I need output like this (the same output as above, except we get a ratio of true:false for each day):

2021-09-19, 72.7
2021-09-18, 88.8

The formula to calculate the ratio would be (true_count / (true_count + false_count)) * 100.0

Upvotes: 0

Views: 110

Answers (2)

James Wierzba
James Wierzba

Reputation: 17548

just extending upon @Gordon Linoff answer since I'm using presto SQL (very slight difference in ternary expression syntax)

select date_format(from_unixtime(time), '%Y-%m-%d') as ds,
       avg( case column1 when column1 = 'true' then 1.0 else 0 end) as ratio
from tbl
group by date_format(from_unixtime(time), '%Y-%m-%d'), 
order by ds desc;

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269853

If you want the ratio of true for each day, you can use window functions:

select date_format(from_unixtime(time), '%Y-%m-%d') as ds,
       avg( case when column1 = 'true' then 1.0 else 0 end) as ratio
from tbl
group by date_format(from_unixtime(time), '%Y-%m-%d'), 
order by ds desc;

Your code looks like MySQL. If so, this can be further simplified:

select date_format(from_unixtime(time), '%Y-%m-%d') as ds,
       avg( column1 = 'true' ) as ratio
from tbl
group by date_format(from_unixtime(time), '%Y-%m-%d'), 
order by ds desc;

Upvotes: 1

Related Questions