bash/curl: two-step web form submission

Question

I'd like to submit two forms on the same page in sequence with curl in bash. http://en.wikipedia.org/w/index.php?title=Special:Export contains two forms: one to populate a list of pages given a Wikipedia category, and another to fetch XML data for that list.

Using curl in bash, I can submit the first form independently, returning an html file with the pages field populated (though I can't use it, as it's local instead of on the wikipedia server):

curl -d "addcat=1&catname=Works_by_Leonardo_da_Vinci&curonly=1&action=submit" http://en.wikipedia.org/w/index.php?title=Special:Export -o "somefile.html"

And I can submit the second form while specifying a page, to get the XML:

curl -d "pages=Mona_Lisa&curonly=1&action=submit" http://en.wikipedia.org/w/index.php?title=Special:Export -o "output.xml"

...but I can't figure out how to combine the two steps, or pipe the one into the other, to return XML for all the pages in a category, like I get when I perform the two steps manually. http://www.mediawiki.org/wiki/Manual:Parameters_to_Special:Export seems to suggest that this is possible; any ideas? I don't have to use curl or bash.

svick · Accepted Answer

Special:Export is not meant for fully automatic retrieval. The API is. For example, to get the current text of all pages in Category:Works by Leonardo da Vinci in XML format, you can use this URL:

http://en.wikipedia.org/w/api.php?format=xml&action=query&generator=categorymembers&gcmtitle=Category:Works_by_Leonardo_da_Vinci&prop=revisions&rvprop=content&gcmlimit=max

This won't return pages in subcategories and is limited only to first 500 pages (although that's not a problem in this case and there is a way to access the rest).

bash/curl: two-step web form submission

Answers (2)

Related Questions