Reputation: 2886
I'm attempting to determine the average time between two events in my Firebase analytics using BigQuery. The table looks something like this:
I'd like to collect the timstamp_micros for the LOGIN_CALL and LOGIN_CALL_OK events, subtract LOGIN_CALL from LOGIN_CALL_OK and compute the average for this across all rows.
#standardSQL
SELECT AVG(
(SELECT
event.timestamp_micros
FROM
`table`,
UNNEST(event_dim) AS event
where event.name = "LOGIN_CALL_OK") -
(SELECT
event.timestamp_micros
FROM
`table`,
UNNEST(event_dim) AS event
where event.name = "LOGIN_CALL"))
from `table`
I've managed to list either the low or the hi numbers, but any time I try to do any math on them I run into errors I'm struggling to pull apart. This approach above seems like it should work but i get the following error:
Error: Scalar subquery produced more than one element
I read this error to mean that each of the UNNEST() functions is returning an array, and not single value which is causing AVG to barf. I've tried to unnest once and apply a "low" and "hi" name to the values, but can't figure out how to filter using the event_dim.name correctly.
Upvotes: 0
Views: 2303
Reputation: 11787
I couldn't fully test this one but maybe this might work for you:
WITH data AS(
SELECT STRUCT('1' as user_id) user_dim, ARRAY< STRUCT<date string, name string, timestamp_micros INT64> > [('20170610', 'EVENT1', 1497088800000000), ('20170610', 'LOGIN_CALL', 1498088800000000), ('20170610', 'LOGIN_CALL_OK', 1498888800000000), ('20170610', 'EVENT2', 159788800000000), ('20170610', 'LOGIN_CALL', 1599088800000000), ('20170610', 'LOGIN_CALL_OK', 1608888800000000)] event_dim union all
SELECT STRUCT('2' as user_id) user_dim, ARRAY< STRUCT<date string, name string, timestamp_micros INT64> > [('20170610', 'EVENT1', 1497688500400000), ('20170610', 'LOGIN_CALL', 1497788800000000)] event_dim UNION ALL
SELECT STRUCT('3' as user_id) user_dim, ARRAY< STRUCT<date string, name string, timestamp_micros INT64> > [('20170610', 'EVENT1', 1487688500400000), ('20170610', 'LOGIN_CALL', 1487788845000000), ('20170610', 'LOGIN_CALL_OK', 1498888807700000)] event_dim
)
SELECT
AVG(time_diff) avg_time_diff
FROM(
SELECT
CASE WHEN e.name = 'LOGIN_CALL' AND LEAD(NAME,1) OVER(PARTITION BY user_dim.user_id ORDER BY timestamp_micros ASC) = 'LOGIN_CALL_OK' THEN TIMESTAMP_DIFF(TIMESTAMP_MICROS(LEAD(TIMESTAMP_MICROS, 1) OVER(PARTITION BY user_dim.user_id ORDER BY timestamp_micros ASC)), TIMESTAMP_MICROS(TIMESTAMP_MICROS), day) END time_diff
FROM data,
UNNEST(event_dim) e
WHERE e.name in ('LOGIN_CALL', 'LOGIN_CALL_OK')
)
I've simulated 3 users with the same schema that you have in Firebase Schema.
Basically, I first applied the UNNEST
operation so to have each value of event_dim.name
. Then applied filter to get only the events that you are interested in, that is, "LOGIN_CALL" and "LOGIN_CALL_OK".
As Mosha commented above, you do need to have some identification for these rows as otherwise you won't know which event succeeded which so that's why the partitioning of the analytical functions takes the user_dim.user_id
as input as well.
After that, it's just TIMESTAMP operations to get the differences when appropriate (when the leading event is "LOGIN_CALL_OK" and the current one being "LOGIN_CALL" then take the difference. This is expressed in the CASE expression).
You can choose in the TIMESTAMP_DIFF function which part of the date you want to analyze, such as seconds, minutes, days and so on.
Upvotes: 4