Reputation: 13
I am trying to analyse the number of VIEWS for Stack Overflow questions over time. I just need a few snapshots (e.g. quarterly) for the last 1 - 2 years, to demonstrate how demand for 'answers' in different technologies is shifting over time.
Does anyone know if one can access historical versions of a public dataset in BigQuery? Google must be keeping track of these databases as they shift and evolve over time.
BigQuery is hosting StackOverflow's dataset, which is up-to-date. While this allows an analysis of when questions are asked and answered, it doesn't tell you how view counts change over time. I have also explored using Badges and the date they are awarded as a means of assessing changes to view counts, but this is too crude.
There is one data-dump from Stack Exchange in the Internet Archive (Mar 2018), but again, this is not enough!
Upvotes: 1
Views: 118
Reputation: 59165
You are lucky, I have the perfect answer for this question: Yes.
To query my archives, you can do something like:
#standardSQL
WITH historic_views AS (
SELECT *, '201703' q
FROM `fh-bigquery.stackoverflow_archive.201703_posts_questions`
UNION ALL
SELECT *, '201706' q
FROM `fh-bigquery.stackoverflow_archive.201706_posts_questions`
UNION ALL
SELECT *, '201709' q
FROM `fh-bigquery.stackoverflow_archive.201709_posts_questions`
UNION ALL
SELECT *, '201712' q
FROM `fh-bigquery.stackoverflow_archive.201712_posts_questions`
UNION ALL
SELECT *, '201803' q
FROM `fh-bigquery.stackoverflow_archive.201803_posts_questions`
UNION ALL
SELECT *, '201806' q
FROM `fh-bigquery.stackoverflow_archive.201806_posts_questions`
)
Upvotes: 1