ozz
ozz

Reputation: 183

Export articles in recent changes to an xml dump

i am looking for a solution to dump (xml format, incl. templates) each article how was edited in the last hour.

i started with imacro for firefox. Getting the list of articles are now fine. Currently i have troube with article name, which includes spaces or german umlaut.

e.g.Eidgen%C3%B6ssische_Konstruktionswerkst%C3%A4tte_K%2BW_C-35

How can i convert i to "real" article names?

Upvotes: 0

Views: 92

Answers (1)

brightbyte
brightbyte

Reputation: 991

The title you are seeing is encoded for use in a URL. Your programming language should provide a standard method for decoding these, e.g. "urldecode" in PHP, "decodeURIComponent" in JavaScript, "urllib2.quote" in Python, etc.

But you shouldn't need to do this at all if you fetch the titles of the changed pages via the MediaWiki API. See this query for Wikipedia, for example: https://de.wikipedia.org/w/api.php?action=query&list=recentchanges&format=xml

Upvotes: 1

Related Questions