Reputation: 192
I have timeseries data like this:
time | id | value |
---|---|---|
2018-04-25 22:00:00 UTC | A | 1 |
2018-04-25 23:00:00 UTC | A | 2 |
2018-04-25 23:00:00 UTC | A | 2.1 |
2018-04-25 23:00:00 UTC | B | 1 |
2018-04-26 23:00:00 UTC | B | 1.3 |
How do i write a query to produce an output table with these columns:
time
, id
are not unique. In the example data above the two records with id==A at 2018-04-25 23:00:00 UTC would be counted for date 2018-04-25So the output of our query should be:
date | records | records_conflicting_time_id |
---|---|---|
2018-04-25 | 4 | 2 |
2018-04-26 | 1 | 0 |
Getting records
is easy, i just truncate the time
to get date and then group by date
. But i'm really struggling to produce a column that counts the number of records where id
+ time
is not unique over that date...
Upvotes: 0
Views: 821
Reputation: 172994
Consider below approach
select date(time) date,
sum(cnt) records,
sum(if(cnt > 1, cnt, 0)) conflicting_records
from (
select time, id, count(*) cnt
from your_table
group by time, id
)
group by date
if applied to sample data in your question - output is
Upvotes: 1
Reputation: 1132
with YOUR_DATA as
(
select cast('2018-04-25 22:00:00 UTC' as timestamp) as `time`, 'A' as id, 1.0 as value
union all select cast('2018-04-25 23:00:00 UTC' as timestamp) as `time`, 'A' as id, 2.0 as value
union all select cast('2018-04-25 23:00:00 UTC' as timestamp) as `time`, 'A' as id, 2.1 as value
union all select cast('2018-04-25 23:00:00 UTC' as timestamp) as `time`, 'B' as id, 1.0 as value
union all select cast('2018-04-26 23:00:00 UTC' as timestamp) as `time`, 'B' as id, 1.3 as value
)
select cast(timestamp_trunc(t1.`time`, day) as date) as `date`,
count(*) as records,
case when count(*)-count(distinct cast(t1.`time` as string) || t1.id) = 0 then 0
else count(*)-count(distinct cast(t1.`time` as string) || t1.id)+1
end as records_conflicting_time_id
from YOUR_DATA t1
group by cast(timestamp_trunc(t1.`time`, day) as date)
;
Upvotes: 0