AnsFourtyTwo
AnsFourtyTwo

Reputation: 2518

Why is query valid on sample_commits but not for commits in public github_repos dataset?

I am running into some strange behaviour of Google's BigQuery. The dataset bigquery-public-data.github_repos provides table commits and sample_commits. Schema of the tables should be the same, the difference is only in the size of the table, so one can develop queries without wasting limited data.

When I run the query below on the sample data, I get the correct result of 10 commits:

SELECT 
  commit AS commit, 
  repo_name AS repo_name,
  committer.date AS date
FROM 
  `bigquery-public-data.github_repos.sample_commits`
WHERE 
  repo_name = "torvalds/linux"
LIMIT 10

The strange thing is, when I run the query on the big table commits, I get an error message:

SELECT 
  commit AS commit, 
  repo_name AS repo_name,
  committer.date AS date
FROM 
  `bigquery-public-data.github_repos.commits`
WHERE 
  repo_name = "torvalds/linux"
LIMIT 10

The error message I get is:

No matching signature for operator = for argument types: ARRAY, STRING. Supported signatures: ANY = ANY at [8:3]

From the schema table I also know, that repo_name is of type STRING, so this error confuses me a lot.

Upvotes: 1

Views: 169

Answers (1)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172994

Field repo_name is of STRING NULLABLE data type in bigquery-public-data.github_repos.sample_commits table

while same field in bigquery-public-data.github_repos.commits table is of STRING REPEATD data type

Try below instead

SELECT 
  commit AS commit, 
  repo_name AS repo_name,
  committer.date AS DATE
FROM 
  `bigquery-public-data.github_repos.commits`
WHERE 
  'torvalds/linux' IN UNNEST(repo_name)
LIMIT 10

Upvotes: 2

Related Questions