Reputation: 44399
I would like to use BigQuery Standard SQL for a query like this one:
SELECT package, COUNT(*) count
FROM (
SELECT REGEXP_EXTRACT(line, r' ([a-z0-9\._]*)\.') package, id
FROM (
SELECT SPLIT(content, '\n') line, id
FROM [github-groovy-files:github.contents]
WHERE content CONTAINS 'import'
HAVING LEFT(line, 6)='import' )
GROUP BY package, id
)
GROUP BY 1
ORDER BY count DESC
LIMIT 30;
I cannot get past something like this (works but doesn't GROUP or COUNT):
with lines as
(SELECT SPLIT(c.content, '\n') line, c.id as id
FROM `<dataset>.contents` c, `<dataset>.files` f
WHERE c.id = f.id AND f.path LIKE '%.groovy')
select
array(select REGEXP_REPLACE(l, r'import |;', '') AS class from unnest(line) as l where l like 'import %') imports, id
from lines;
LEFT()
is not in Standard SQL and there doesn't seem to be a function that will accept and array type.
Upvotes: 0
Views: 205
Reputation: 173190
LEFT() is not in Standard SQL ...
In BigQuery Standard SQL you can use SUBSTR(value, position[, length])
instead of Legacy's LEFT
... and there doesn't seem to be a function that will accept and array type.
There are plenty of Array's related functions as well as functions that accept array as argument - for example UNNEST()
I would like to use BigQuery Standard SQL for a query like this one:
Below is equivalent query for BigQuery Standard SQL
SELECT package, COUNT(*) COUNT
FROM (
SELECT REGEXP_EXTRACT(line, r' ([a-z0-9\._]*)\.') package, id
FROM (
SELECT line, id
FROM `github-groovy-files.github.contents`,
UNNEST(SPLIT(content, '\n')) line
WHERE SUBSTR(line, 1, 6)='import'
)
GROUP BY package, id
)
GROUP BY 1
ORDER BY COUNT DESC
LIMIT 30
Instead of WHERE SUBSTR(line, 1, 6)='import'
you can use WHERE line LIKE 'import%'
Also note, this query can be written in number of ways - so in my above example I focused on "translating" your query into from legacy to standard sql while preserving core structure and approach of original query
But if you woukld like to rewrite it using power of Standard SQL - you would ended up with something like below
SELECT REGEXP_EXTRACT(line, r' ([a-z0-9\._]*)\.') package, COUNT(DISTINCT id) count
FROM `github-groovy-files.github.contents`,
UNNEST(SPLIT(content, '\n')) line
WHERE line LIKE 'import%'
GROUP BY 1
ORDER BY count DESC
LIMIT 30
Upvotes: 3