Reputation: 2518
I am running into some strange behaviour of Google's BigQuery. The dataset bigquery-public-data.github_repos
provides table commits
and sample_commits
. Schema of the tables should be the same, the difference is only in the size of the table, so one can develop queries without wasting limited data.
When I run the query below on the sample data, I get the correct result of 10 commits:
SELECT
commit AS commit,
repo_name AS repo_name,
committer.date AS date
FROM
`bigquery-public-data.github_repos.sample_commits`
WHERE
repo_name = "torvalds/linux"
LIMIT 10
The strange thing is, when I run the query on the big table commits
, I get an error message:
SELECT
commit AS commit,
repo_name AS repo_name,
committer.date AS date
FROM
`bigquery-public-data.github_repos.commits`
WHERE
repo_name = "torvalds/linux"
LIMIT 10
The error message I get is:
No matching signature for operator = for argument types: ARRAY, STRING. Supported signatures: ANY = ANY at [8:3]
From the schema table I also know, that repo_name
is of type STRING
, so this error confuses me a lot.
Upvotes: 1
Views: 169
Reputation: 172994
Field repo_name
is of STRING NULLABLE data type in bigquery-public-data.github_repos.sample_commits
table
while same field in bigquery-public-data.github_repos.commits
table is of STRING REPEATD data type
Try below instead
SELECT
commit AS commit,
repo_name AS repo_name,
committer.date AS DATE
FROM
`bigquery-public-data.github_repos.commits`
WHERE
'torvalds/linux' IN UNNEST(repo_name)
LIMIT 10
Upvotes: 2