Reputation: 422
I have a group of BigQuery tables which collect test results on a day by day basis, each table being named like various_tests.test_name_20190523
. I have a query which I can run over specified date ranges to find the number of failures and ratio of failures to all tests, but I'd prefer to get a table with several date ranges each as a row in the table, e.g. BETWEEN "20190901" AND "20190916", BETWEEN "20190916" AND "20191001", BETWEEN "20191001" AND "20191016"
. The columns of each row will be the same as the single-row result here. What's a good way to do this?
SELECT
"20190916" as StartDate, "20191001" as EndDate,
SUM(CASE WHEN status IN ("BAD") AND foo = 'bar' then 1 else 0 end) as Bad, COUNT(*) as Total,
(SUM(CASE WHEN status IN ("BAD") AND foo = 'bar' then 1 else 0 end)/ COUNT(*)) as Ratio
FROM
`various_tests.test_name_*`
WHERE
_TABLE_SUFFIX BETWEEN "20190916" AND "20191001"
(The real query has several more conditions in the WHERE
and CASE
clauses but they are elided for clarity.)
Upvotes: 2
Views: 2847
Reputation: 5503
One way is to use scripting. Below example uses bigquery-public-data.google_analytics_sample.ga_sessions_*
to demonstrate the idea. You can easily adapt it to your case.
Also date_ranges
can be generated based on your need.
DECLARE date_ranges ARRAY<STRUCT<s STRING, e STRING>>
DEFAULT [
('20170801', '20170802'),
('20170703', '20170704'),
('20170603', '20170604')
];
DECLARE index INT64 DEFAULT 0;
CREATE TEMP TABLE result(s STRING, e STRING, cnt INT64);
LOOP
IF index = array_length(date_ranges)
THEN BREAK;
END IF;
BEGIN
DECLARE date_start STRING DEFAULT date_ranges[OFFSET(index)].s;
DECLARE date_end STRING DEFAULT date_ranges[OFFSET(index)].e;
INSERT INTO result
SELECT date_start, date_end, count(*) cnt
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE _TABLE_SUFFIX BETWEEN date_start and date_end ;
SET index = index + 1;
END;
END LOOP;
SELECT * FROM result;
Output
+----------+----------+------+
| s | e | cnt |
+----------+----------+------+
| 20170703 | 20170704 | 3984 |
| 20170603 | 20170604 | 2933 |
| 20170801 | 20170802 | 2556 |
+----------+----------+------+
Cost
Same cost as manually change start_date/end_date to run multiple times.
Performance
Not as efficient as single query (which you have to manually cook) since multiple INSERT INTO has to be run separately.
Scalability
Temp table is still subject to daily DML quota per table, so only 1000 inserts can be done in single script.
Upvotes: 1
Reputation: 172993
Try below
SELECT
-- "20190916" as StartDate, "20191001" as EndDate,
_TABLE_SUFFIX AS Day,
SUM(CASE WHEN status IN ("BAD") AND foo = 'bar' then 1 else 0 end) as Bad, COUNT(*) as Total,
(SUM(CASE WHEN status IN ("BAD") AND foo = 'bar' then 1 else 0 end)/ COUNT(*)) as Ratio
FROM
`various_tests.test_name_*`
WHERE
_TABLE_SUFFIX BETWEEN "20190916" AND "20191001"
GROUP BY DAY
way of making several date ranges and running the query for each
SELECT
-- "20190916" as StartDate, "20191001" as EndDate,
CASE
WHEN _TABLE_SUFFIX BETWEEN "20190916" AND "20191001" THEN "20190916"
WHEN _TABLE_SUFFIX BETWEEN "20180916" AND "20181001" THEN "20180916"
END AS StartDate,
CASE
WHEN _TABLE_SUFFIX BETWEEN "20190916" AND "20191001" THEN "20191001"
WHEN _TABLE_SUFFIX BETWEEN "20180916" AND "20181001" THEN "20181001"
END AS EndDate,
SUM(CASE WHEN status IN ("BAD") AND foo = 'bar' then 1 else 0 end) as Bad, COUNT(*) as Total,
(SUM(CASE WHEN status IN ("BAD") AND foo = 'bar' then 1 else 0 end)/ COUNT(*)) as Ratio
FROM
`various_tests.test_name_*`
WHERE _TABLE_SUFFIX BETWEEN "20190916" AND "20191001"
OR _TABLE_SUFFIX BETWEEN "20180916" AND "20181001"
GROUP BY StartDate, EndDate
... an easier way ...
To avoid repetition of all conditions - try something like below
SELECT
-- "20190916" as StartDate, "20191001" as EndDate,
(CASE
WHEN _TABLE_SUFFIX BETWEEN "20190916" AND "20191001" THEN STRUCT("20190916" AS StartDate, "20191001" AS EndDate)
WHEN _TABLE_SUFFIX BETWEEN "20180916" AND "20181001" THEN ("20180916", "20181001")
END).*,
SUM(CASE WHEN status IN ("BAD") AND foo = 'bar' then 1 else 0 end) as Bad, COUNT(*) as Total,
(SUM(CASE WHEN status IN ("BAD") AND foo = 'bar' then 1 else 0 end)/ COUNT(*)) as Ratio
FROM
`various_tests.test_name_*`
WHERE _TABLE_SUFFIX BETWEEN "20190916" AND "20191001"
OR _TABLE_SUFFIX BETWEEN "20180916" AND "20181001"
GROUP BY StartDate, EndDate
Upvotes: 0