Reputation: 1207
I am following the post at https://pushshift.io/using-bigquery-with-reddit-data which gives an example SQL query:
SELECT * FROM [pushshift:rt_reddit.comments@-60000-] LIMIT 1;
With this I get the following error:
Time range decorator is not supported on partitioned tables
Other queries fail on that page as well for me. Has anything changed since that was written?
Upvotes: 2
Views: 414
Reputation: 33745
It looks like the author converted the table to use time-based partitioning since that post was created. You can use a filter on the created_utc
column to restrict how much data is read, e.g.:
SELECT * FROM pushshift.rt_reddit.comments WHERE DATE(created_utc) = '2018-06-26';
If you click the green checkmark to the right and below the query editor in the BigQuery UI, you should see something like Processing up to 676.57 MB.
, which is much less than the size of the entire table. As another example, if you wanted to search through the comments for a particular word, as shown in some of the queries on that page, you might write a query such as:
SELECT COUNT(*), subreddit
FROM pushshift.rt_reddit.comments
WHERE created_utc BETWEEN '2018-05-01' AND '2018-05-31' AND
LOWER(body) LIKE '%google%'
GROUP BY subreddit
ORDER BY 1 DESC;
Upvotes: 2