Asteroid098
Asteroid098

Reputation: 2825

How to count occurrences of item in bigquery

I am trying to find occurrences of certain phrases in reddit comment bigquery 2015. The phrases were also obtained through running a query. I saved the table results so that the phrases could be accessed (stored in column):

SELECT c
 FROM (
 SELECT title 
 FROM [lucid-bond-156818:bquijob_658cefdf_15a212b8aaf] #the phrases I want to search within reddit comment bigquery
 SELECT COUNT(title) as c in [fh-bigquery:reddit_comments.all_2015]
 )

I'm a beginner in SQL so I'm having trouble fixing it, in the last line I basically want to tell the query to count the titles in the reddit comments 2015 data.

Upvotes: 1

Views: 5769

Answers (1)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172944

To start with - try below
This query will give you list of titles along with count of comments this title appeared in. (i think this is what you asked for - if i understood you correctly)

#legacySQL
SELECT title, COUNT(1) AS occurrences 
FROM [fh-bigquery:reddit_comments.2005] AS c
CROSS JOIN YourTableWithTitles
WHERE body CONTAINS title
GROUP BY title
ORDER BY occurrences DESC

Above assumes that you have all your titles stored in YourTableWithTitles table in title column

As you can see here, I am using [fh-bigquery:reddit_comments.2005] table instead of [fh-bigquery:reddit_comments.all_2015] view. This is to make query cheap while testing (2005 table has just 1075 rows whereas all_2015 view span over 668 million rows). When you make final version of it - you can switch back to that view

Also, as this view is written in Legacy SQL - the query that uses this view must also be in Legacy SQL, so I stick with it too

As you are beginner - it is not much a difference - but ideally when you get a little bit into it - you should check BigQuery Standard SQL

Upvotes: 2

Related Questions