Semaj
Semaj

Reputation: 13

Accessing BigQuery archived instances of public Stack Overflow datasets

I am trying to analyse the number of VIEWS for Stack Overflow questions over time. I just need a few snapshots (e.g. quarterly) for the last 1 - 2 years, to demonstrate how demand for 'answers' in different technologies is shifting over time.

enter image description here

Does anyone know if one can access historical versions of a public dataset in BigQuery? Google must be keeping track of these databases as they shift and evolve over time.

BigQuery is hosting StackOverflow's dataset, which is up-to-date. While this allows an analysis of when questions are asked and answered, it doesn't tell you how view counts change over time. I have also explored using Badges and the date they are awarded as a means of assessing changes to view counts, but this is too crude.

There is one data-dump from Stack Exchange in the Internet Archive (Mar 2018), but again, this is not enough!

Upvotes: 1

Views: 118

Answers (1)

Felipe Hoffa
Felipe Hoffa

Reputation: 59165

You are lucky, I have the perfect answer for this question: Yes.

Check: https://towardsdatascience.com/these-are-the-real-stack-overflow-trends-use-the-pageviews-c439903cd1a

To query my archives, you can do something like:

#standardSQL
WITH historic_views AS (
  SELECT *, '201703' q
  FROM `fh-bigquery.stackoverflow_archive.201703_posts_questions` 
  UNION ALL
  SELECT *, '201706' q 
  FROM `fh-bigquery.stackoverflow_archive.201706_posts_questions` 
  UNION ALL
  SELECT *,  '201709' q 
  FROM `fh-bigquery.stackoverflow_archive.201709_posts_questions` 
  UNION ALL
  SELECT *, '201712' q 
  FROM `fh-bigquery.stackoverflow_archive.201712_posts_questions` 
  UNION ALL
  SELECT *, '201803' q 
  FROM `fh-bigquery.stackoverflow_archive.201803_posts_questions` 
  UNION ALL
  SELECT *, '201806' q 
  FROM `fh-bigquery.stackoverflow_archive.201806_posts_questions` 
)

Upvotes: 1

Related Questions