GG123GG
GG123GG

Reputation: 25

SQL/BigQuery on github samples

I'm using the google bigquery tool, I'm trying to select ALL sample github repositories that have a pom.xml file and within the content of the file, have an artifact id ex-ex e.g <artifactId>ex-ex</artifactId>

For this I have broken it down into 2 steps:

1) Find all pom.xml files

SELECT sample_repo_name FROM 'bigquery-public-data.github_repos.sample_contents' WHERE sample_path LIKE 'pom.xml'

2) Select the repositories which contain ex-ex artifact (in the content table)

AND content LIKE '%ex-ex'

The 2nd part of the query does not work (no results found) and is likely due to some syntax error somewhere. Full query below:

SELECT sample_repo_name FROM 'bigquery-public-data.github_repos.sample_contents' WHERE sample_path LIKE 'pom.xml' AND content LIKE '%ex-ex' LIMIT 1000

Would really appreciate help with this, thanks!

Upvotes: 0

Views: 143

Answers (1)

rtenha
rtenha

Reputation: 3616

Have you tried '%ex-ex%'? Without the second %, you are only searching for records whose last 5 characters are 'ex-ex'. Adding content to the select in your first query and spot checking a few results, the content field appears to be XML (pom.xml, duh) and seem to end with </project>, and thus will probably never match with '%ex-ex'.

Upvotes: 1

Related Questions