meetar
meetar

Reputation: 7611

bash/curl: two-step web form submission

I'd like to submit two forms on the same page in sequence with curl in bash. http://en.wikipedia.org/w/index.php?title=Special:Export contains two forms: one to populate a list of pages given a Wikipedia category, and another to fetch XML data for that list.

Using curl in bash, I can submit the first form independently, returning an html file with the pages field populated (though I can't use it, as it's local instead of on the wikipedia server):

curl -d "addcat=1&catname=Works_by_Leonardo_da_Vinci&curonly=1&action=submit" http://en.wikipedia.org/w/index.php?title=Special:Export -o "somefile.html"

And I can submit the second form while specifying a page, to get the XML:

curl -d "pages=Mona_Lisa&curonly=1&action=submit" http://en.wikipedia.org/w/index.php?title=Special:Export -o "output.xml"

...but I can't figure out how to combine the two steps, or pipe the one into the other, to return XML for all the pages in a category, like I get when I perform the two steps manually. http://www.mediawiki.org/wiki/Manual:Parameters_to_Special:Export seems to suggest that this is possible; any ideas? I don't have to use curl or bash.

Upvotes: 0

Views: 1292

Answers (2)

svick
svick

Reputation: 244767

Special:Export is not meant for fully automatic retrieval. The API is. For example, to get the current text of all pages in Category:Works by Leonardo da Vinci in XML format, you can use this URL:

http://en.wikipedia.org/w/api.php?format=xml&action=query&generator=categorymembers&gcmtitle=Category:Works_by_Leonardo_da_Vinci&prop=revisions&rvprop=content&gcmlimit=max

This won't return pages in subcategories and is limited only to first 500 pages (although that's not a problem in this case and there is a way to access the rest).

Upvotes: 1

Foo Bah
Foo Bah

Reputation: 26261

Assuming you can parse the output from the first html file and generate a list of pages (e.g.

Mona Lisa
The Last Supper

You can pipe the output to a bash loop using read. As a simple example:

$ seq 1 5 | while read x; do echo "I read $x"; done
I read 1
I read 2
I read 3
I read 4
I read 5

Upvotes: 0

Related Questions