BigQuery Multiple Joins

Question

I have an Apache combined log file that loads into Bigquery. Which has a schema that consists of resource, place_id, ip, start_time, end_time, device, status. I am trying to run a query that counts the number of resources and number of devices and groups them by resource and device.

Table:

resource   |  place_id  |  device  |  ip      |  status  |
-----------------------------------------------------------------
/resource1 | 6750320008 |  android | x.x.x.x  |  200     |
/resource1 | 6750320100 |  ipad    | x.x.x.y  |  200     |
/resource2 | 6750320008 |  android | x.x.x.z  |  200     |

Query:

SELECT resource, device
FROM (
  Select 
    EXACT_COUNT_DISTINCT(resource) AS URL,
    1 AS scalar,
  FROM ([daily_logs.app_logs_data]) 
  WHERE place_id = '6750320008' GROUP BY URL) AS datal
JOIN (
  SELECT
    COUNT(device) as DeviceCount,
    1 AS scalar
  FROM ([daily_logs.app_logs_data]) GROUP BY DeviceCount) AS y
ON datal.scalar=y.scalar

I receive this error: Error: Cannot group by an aggregate.

I am basically tyring to create two tables from the same table that count different items and then I want to join them together but have them be grouped in order like this:

     URL   |  totalresourcecount  |  device  |  totaldevicecount
-----------------------------------------------------------------
/resource1 |          1           |  android |         1 
/resource1 |          1           |  ipad    |         1 
/resource2 |          1           |  android |         1

I have read through the google bigquery syntax help and looked at some examples but nothing has generated the desired result. Thanks in advance!

Mikhail Berlyant · Accepted Answer

Below is for BigQuery Standard SQL and reflects your logic presented in follow-up comments

#standardSQL
SELECT resource, device, COUNT(1) cnt
FROM `project.dataset.yourtable`
WHERE place_id = '6750320008'
GROUP BY resource, device

You can test / play with above using dummy data as below

#standardSQL
WITH `project.dataset.yourtable` AS (
  SELECT '/resource1' resource, '6750320008' place_id, 'android' device, 'x.x.x.x' ip, 200 status UNION ALL
  SELECT '/resource1', '6750320100', 'ipad', 'x.x.x.y', 200 UNION ALL
  SELECT '/resource2', '6750320008', 'android', 'x.x.x.z', 200
)
SELECT resource, device, COUNT(1) cnt
FROM `project.dataset.yourtable`
WHERE place_id = '6750320008'
GROUP BY resource, device

Please note - above is based on how I understood your query logic expressed in your follow-up comments

BigQuery Multiple Joins

Answers (1)

Related Questions