Cloud Noob
Cloud Noob

Reputation: 21

BigQuery unexpected keyword

I'm just getting started with Google BigQuery, and have run into issues with my very first query. I'm trying to get a list of Stack Overflow posts since and including 2015-01-01 which have one of several tags. Below is my first pass at the query:

#standardSQL

SELECT
 title,
 body,
 answer_count,
 creation_date,
 tags,
 view_count 

FROM
 `bigquery-public-data.stackoverflow.posts_questions` limit 10

WHERE
 creation_date >= "2015-01-01" AND tags HAVING "terraform" OR "chef" OR "puppet" OR "ansible"

The BigQuery validator is showing the following error message:

Error: Syntax error: Unexpected keyword WHERE at [14:1]

Upvotes: 2

Views: 29504

Answers (3)

Kalu Ubi
Kalu Ubi

Reputation: 1

SELECT
usertype, CONCAT(start_station_name, " to ",end_station_name) AS route, COUNT(*) as num_trips, ROUND(AVG(cast(tripduration as int64)/60),2) AS duration FROM bigquery-public-data.new_york_citibike.citibike_trips GROUP BY start_station_name, end_station_name,usertype, ORDER BY num_trips DESC LIMIT 10

Upvotes: -1

Elliott Brossard
Elliott Brossard

Reputation: 33765

There are a few issues here, but hopefully this will help:

  • LIMIT needs to come last, after the WHERE clause.
  • While HAVING is valid in some contexts--namely as a post-aggregation (i.e. after GROUP BY) filter, it doesn't have the meaning that I think you want here.

With that said this query might be what you want:

#standardSQL
SELECT
  title,
  body,
  answer_count,
  creation_date,
  tags,
  view_count
FROM `bigquery-public-data.stackoverflow.posts_questions`
WHERE creation_date >= "2015-01-01" AND
  EXISTS (
    SELECT 1 FROM UNNEST(SPLIT(tags, "|")) AS tag
    WHERE tag IN ("terraform", "chef", "puppet", "ansible")
  )
LIMIT 10;

Note that I needed to use SPLIT with the tags column because the tags are separated by the pipe character. Since you get a terabyte of querying for free, try to make the most of it by getting all the results at once rather than using the LIMIT, too.

Upvotes: 0

Graham Polley
Graham Polley

Reputation: 14791

You have a few syntax errors, namely the limit 10 in the wrong place, and using the HAVING keyword incorrectly. I'd also use native timestamp instead of comparing strings:

#standardSQL
SELECT
  title,
  body,
  answer_count,
  creation_date,
  tags,
  view_count
FROM
  `bigquery-public-data.stackoverflow.posts_questions`
WHERE
  creation_date >= TIMESTAMP('2015-01-01')
  AND tags IN ('terraform',
    'chef',
    'puppet',
    'ansible')
LIMIT
  10

Upvotes: 2

Related Questions