Query that counts total records per day and total records with same time timestamp and id per day in Bigquery

Question

I have timeseries data like this:

time	id	value
2018-04-25 22:00:00 UTC	A	1
2018-04-25 23:00:00 UTC	A	2
2018-04-25 23:00:00 UTC	A	2.1
2018-04-25 23:00:00 UTC	B	1
2018-04-26 23:00:00 UTC	B	1.3

How do i write a query to produce an output table with these columns:

date: the truncated time
records: the number of records during this date
records_conflicting_time_id: the number of records during this date where the combination of time, id are not unique. In the example data above the two records with id==A at 2018-04-25 23:00:00 UTC would be counted for date 2018-04-25

So the output of our query should be:

date	records	records_conflicting_time_id
2018-04-25	4	2
2018-04-26	1	0

Getting records is easy, i just truncate the time to get date and then group by date. But i'm really struggling to produce a column that counts the number of records where id + time is not unique over that date...

Mikhail Berlyant · Accepted Answer

Consider below approach

select date(time) date, 
  sum(cnt) records, 
  sum(if(cnt > 1, cnt, 0)) conflicting_records
from (
  select time, id, count(*) cnt
  from your_table
  group by time, id
)
group by date

if applied to sample data in your question - output is

Query that counts total records per day and total records with same time timestamp and id per day in Bigquery

Answers (2)

Related Questions