Reputation: 264
I am trying to automate a data downloading process. For this purpose, my goal is to extract (using bash commands) the .zip from a redirection link that could be seen on display here: https://journals.sagepub.com/doi/suppl/10.1177/0022002706289303
I have seen that people suggest the -L
tag with curl
for redirections, but it doesn't seem to work for my case. The specific command I have tried is:
curl -L -o output.zip https://journals.sagepub.com/doi/suppl/10.1177/0022002706289303/suppl_file/Sambanis_Aug_06.zip
The command file output.zip
shows that the extracted .zip file is actually a HTML document text
. On the other hand, clicking the redirection link (used inside curl
command) downloads the extracted folder automatically via a browser.
Any ideas, tips, or suggestions on what I should try (or whether this is possible or not) will be highly appreciated!
Upvotes: 1
Views: 2774
Reputation: 1459
If you execute curl with the --verbose
option you can see that it is a cookie related problem. The cookie engine needs to be enabled. You can download the desired file as follows:
curl -b cookies.txt -L https://journals.sagepub.com/doi/suppl/10.1177/0022002706289303/suppl_file/Sambanis_Aug_06.zip -o test.zip
It doesn't matter if the file provided with the -b option doesn't exist. We just need to activate the cookie engine.
Refer to Send cookies with curl and Save cookies between two curl requests for futher information.
Upvotes: 1
Reputation: 3183
You can download that file with wget
on Linux
$ wget https://journals.sagepub.com/doi/suppl/10.1177/0022002706289303/suppl_file/Sambanis_Aug_06.zip
$ unzip Sambanis_Aug_06.zip
Archive: Sambanis_Aug_06.zip
inflating: Sambanis (Aug 06).dta
inflating: Sambanis Appendix (Aug 06).pdf
Upvotes: 1