Reputation: 2067
I was playing with bigquery and ran into a problem, my Query was:
SELECT * FROM (
SELECT a.title, a.counter , MAX(b.num_characters) as max
FROM (
SELECT title, count(*) as counter FROM publicdata:samples.wikipedia
GROUP EACH BY title
ORDER BY counter DESC
LIMIT 10
) a JOIN
(SELECT title,num_characters FROM publicdata:samples.wikipedia
) b ON a.title = b.title
GROUP BY a.title, a.counter)
LIMIT 1;
Although this is valid, I'm getting response too large to return. The first Subquery is running fine, what I want to do is get a bit more column for it. But I fail.
Upvotes: 1
Views: 379
Reputation: 59165
Don't worry about the "limit 1", the response gets too large before reaching that stage.
Try skipping the second subquery, as it is only selecting 2 columns from the large dataset, without filtering it. A working alternative is:
SELECT
a.title, a.counter, MAX(b.num_characters) AS max
FROM
publicdata:samples.wikipedia b JOIN(
SELECT
title, COUNT(*) AS counter
FROM
publicdata:samples.wikipedia
GROUP EACH BY title
ORDER BY
counter DESC
LIMIT 10) a
ON a.title = b.title
GROUP BY
a.title,
a.counter
This runs in 15.4 seconds.
We can do it faster, using TOP():
SELECT
a.title title, counter, MAX(num_characters) max
FROM
publicdata:samples.wikipedia b
JOIN
(
SELECT
TOP(title, 10) AS title, COUNT(*) AS counter
FROM
publicdata:samples.wikipedia
) a
ON a.title=b.title
GROUP BY
title, counter
TOP() acts as a simpler and faster (SELECT COUNT(*)/GROUP/LIMIT).
https://developers.google.com/bigquery/docs/query-reference#top-function
Now it runs in only 6.5s, processing 15.9 GB.
Upvotes: 2