ZhouW
ZhouW

Reputation: 1207

Bigquery and Pushshift: Time range decorator is not supported on partitioned tables

I am following the post at https://pushshift.io/using-bigquery-with-reddit-data which gives an example SQL query:

SELECT * FROM [pushshift:rt_reddit.comments@-60000-] LIMIT 1;

With this I get the following error:

Time range decorator is not supported on partitioned tables

Other queries fail on that page as well for me. Has anything changed since that was written?

Upvotes: 2

Views: 414

Answers (1)

Elliott Brossard
Elliott Brossard

Reputation: 33745

It looks like the author converted the table to use time-based partitioning since that post was created. You can use a filter on the created_utc column to restrict how much data is read, e.g.:

SELECT * FROM pushshift.rt_reddit.comments WHERE DATE(created_utc) = '2018-06-26';

If you click the green checkmark to the right and below the query editor in the BigQuery UI, you should see something like Processing up to 676.57 MB., which is much less than the size of the entire table. As another example, if you wanted to search through the comments for a particular word, as shown in some of the queries on that page, you might write a query such as:

SELECT COUNT(*), subreddit
FROM pushshift.rt_reddit.comments
WHERE created_utc BETWEEN '2018-05-01' AND '2018-05-31' AND
  LOWER(body) LIKE '%google%'
GROUP BY subreddit
ORDER BY 1 DESC;

Upvotes: 2

Related Questions