Reputation: 6394
I thought of the possibility of writing a Java program that could use the XML and insert it into the database. I extracted the compressed Wikipedia pages file so I have it in xml right now, not only in xml.bz2. I've looked on Wikipedia's website but with no success. Couldn't find something. I imagine this is not supposed to be a very hard process and it should be straightforward and that's why I'm asking you :)
Upvotes: 0
Views: 2076
Reputation: 50368
The .bz2
suffix denotes bzip2 compression. If you're on Linux or another Unixish OS, you probably already have a bzip2 decompresor installed; if you're on Windows, you can download one here.
Note that there are Java libraries that let you read bzip2-compressed streams directly without the need for an external decompressor. One of them can be found here.
Edit: Wait, I think I misread your question. It seems like you've already managed to decompress the XML dump, and now you want to know what to do with it. In that case, you might want to take a look at mwdumper.
Upvotes: 2