Johannes
Johannes

Reputation: 770

List of all questions along with tags on StackOverflow (for NLP tasks)

Since StackOverflow comes with a wealth of questions and user-contributed tags, I am looking at it as an interesting, richly annotated, text corpus for NLP (natural language processing) tasks.

Basically, I want to automatically predict question tags based on the questions body. I am sure this can be done to a certain extend, and there's a number of nice use cases, such as tag suggestions (e.g. to make tag usage more consistent), to name just one.

For this I would need a lot - or even better: - all questions along with their body text and user tags to train a tag predicter with machine learning algorithms.

I know there's the StackOverflow API, but the amount of data I can fetch through it seems to be very limited - for good reasons of course.

So the question is: Is there a way to fetch/download all questions along with their user-tags from StackOverflow?

Upvotes: 1

Views: 233

Answers (1)

paxdiablo
paxdiablo

Reputation: 881623

You can get the data dump at http://www.clearbits.net/torrents/2076-aug-2012, sans the meta sites, a minor oversight which has been fixed with an alternate release, but is not applicable to your request.

Upvotes: 1

Related Questions