Reputation: 727
I'm trying to get text from multiple Pubmed papers using wget, but seems NCBI website don't allow this. Any alternatives?
Bernardos-MacBook-Pro:pangenome_papers_pubmed_result bernardo$ wget -i ./url.txt
--2016-05-04 10:49:34-- http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4560400/
Resolving www.ncbi.nlm.nih.gov... 130.14.29.110, 2607:f220:41e:4290::110
Connecting to www.ncbi.nlm.nih.gov|130.14.29.110|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
2016-05-04 10:49:34 ERROR 403: Forbidden.
--2016-05-04 10:49:34-- http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4547177/
Reusing existing connection to www.ncbi.nlm.nih.gov:80.
HTTP request sent, awaiting response... 403 Forbidden
2016-05-04 10:49:34 ERROR 403: Forbidden.
Upvotes: 15
Views: 60420
Reputation: 1
Had the same problem. Made sure I have my Mozila browser open and copy/pasted the download url from my Linux virtual machine not from Windows OS. Am not an expert but it fixed the problem for me.
Upvotes: 0
Reputation: 4118
Maybe you should try enclosing the url link in double quotes, like
wget "your_url"
Upvotes: -1
Reputation: 16990
I was getting "ERROR 403: Forbidden" when trying to download files with wget from Github (redirects to s3.amazonaws.com actually). But it only happened when using:
wget -N / --timestamping
This tries to download a remote file - Only if it is newer than a local copy of the file.
Apparently, the timestamp checking was forbidden by AWS S3. Removing the -N flag has solved it.
Note that you can also avoid timestamp checking by using -O / --output-document=FILE
, or by downloading to a different directory (which does not include the file yet), with -P / --directory-prefix=PREFIX
.
Upvotes: 0
Reputation: 1740
Set custom User Agent like this:
wget --user-agent="Mozilla" http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4560400/
Upvotes: 36