Reputation: 7611
I'd like to submit two forms on the same page in sequence with curl in bash. http://en.wikipedia.org/w/index.php?title=Special:Export contains two forms: one to populate a list of pages given a Wikipedia category, and another to fetch XML data for that list.
Using curl in bash, I can submit the first form independently, returning an html file with the pages field populated (though I can't use it, as it's local instead of on the wikipedia server):
curl -d "addcat=1&catname=Works_by_Leonardo_da_Vinci&curonly=1&action=submit" http://en.wikipedia.org/w/index.php?title=Special:Export -o "somefile.html"
And I can submit the second form while specifying a page, to get the XML:
curl -d "pages=Mona_Lisa&curonly=1&action=submit" http://en.wikipedia.org/w/index.php?title=Special:Export -o "output.xml"
...but I can't figure out how to combine the two steps, or pipe the one into the other, to return XML for all the pages in a category, like I get when I perform the two steps manually. http://www.mediawiki.org/wiki/Manual:Parameters_to_Special:Export seems to suggest that this is possible; any ideas? I don't have to use curl or bash.
Upvotes: 0
Views: 1292
Reputation: 244767
Special:Export
is not meant for fully automatic retrieval. The API is. For example, to get the current text of all pages in Category:Works by Leonardo da Vinci in XML format, you can use this URL:
This won't return pages in subcategories and is limited only to first 500 pages (although that's not a problem in this case and there is a way to access the rest).
Upvotes: 1
Reputation: 26261
Assuming you can parse the output from the first html file and generate a list of pages (e.g.
Mona Lisa
The Last Supper
You can pipe the output to a bash loop using read
. As a simple example:
$ seq 1 5 | while read x; do echo "I read $x"; done
I read 1
I read 2
I read 3
I read 4
I read 5
Upvotes: 0