user1424739
user1424739

Reputation: 13723

How to download a page with html form (post method) with wget?

I want use wget to get the result of this page http://smart.embl-heidelberg.de/smart/batch.pl

I click " Text-only output" on that page and specify the file for "Identifiers" as a file with the following content.

A0A183

Then I click "Submit query", which will lead me to the result page. I know that I should somehow provide the --post-data option to wget in order to download the result page. But I have difficulty in figuring out what this option should be. Could anyone let me how to figure it out? (I've try Chrome Devtools Network tab. But I'm not sure to get the --post-data option from there).

I also tried the following. But it generated an empty output file.

~$ cat /tmp/000.txt
A0A183
~/linux/test/perl/library/WWW/Mechanize/bin/mech-dump$ mech-dump --forms http://smart.embl-heidelberg.de/smart/batch.pl

GET http://smart.embl-heidelberg.de/smart/search.cgi
  keywords=keywords...           (text)
  <NONAME>=Search SMART          (submit)

POST http://smart.embl-heidelberg.de/smart/batch.pl (multipart/form-data)
  IDS=                           (textarea)
  SEQS=                          (textarea)
  IDFILE=                        (file)
  SEQFILE=                       (file)
  TEXTONLY=<UNDEF>               (checkbox) [*<UNDEF>/off|1/Text-only output]
  LOOSE=<UNDEF>                  (checkbox) [*<UNDEF>/off|1/Substring matching for identifiers]
  DO_PFAM=<UNDEF>                (checkbox) [*<UNDEF>/off|DO_PFAM/include PFAM domains]
  INCLUDE_SIGNALP=<UNDEF>        (checkbox) [*<UNDEF>/off|INCLUDE_SIGNALP/include signal peptides]
  <NONAME>=<UNDEF>               (submit)
  <NONAME>=<UNDEF>               (reset)

~$ wget --post-data='IDFILE=/tmp/000.txt&TEXTONLY=1' http://smart.embl-heidelberg.de/smart/batch.pl

Upvotes: 1

Views: 11083

Answers (2)

trying2bhelpful
trying2bhelpful

Reputation: 41

I know this is old but got an answer that works with wget.

wget 1.13.4 or higher. Check this post: https://superuser.com/questions/86043/linux-command-line-tool-for-uploading-files-over-http-as-multipart-form-data

wget --header="Content-type: multipart/form-data boundary=FILEUPLOAD" --post-file 000.txt http://smart.embl-heidelberg.de/smart/batch.pl

000.txt

--FILEUPLOAD
Content-Disposition: form-data; name="IDS"


--FILEUPLOAD
Content-Disposition: form-data; name="SEQS"


--FILEUPLOAD
Content-Disposition: form-data; name="IDFILE"; filename="000.txt"
Content-Type: text/plain

A0A183
A0A182
--FILEUPLOAD
Content-Disposition: form-data; name="SEQFILE"; filename=""
Content-Type: application/octet-stream


--FILEUPLOAD
Content-Disposition: form-data; name="TEXTONLY"

1
--FILEUPLOAD--

Upvotes: 0

Al Pacifico
Al Pacifico

Reputation: 890

How about:

wget --post-data='IDS=A0A183&TEXTONLY=1' http://smart.embl-heidelberg.de/smart/batch.pl

Upvotes: 2

Related Questions