Reputation: 1817
I want to download a particular section of a website. I am following this wget - Download a sub directory . But the problem is the section of the website does not have any particular url i.e. the urls goes like this http://grephysics.net/ans/0177/*
where * is a number from 1-100 and I cant use http://grephysics.net/ans/0177
in wget. How do I download this 100 webpages with link to each other (i.e. the the Previous and Next button should link to local copies)
Upvotes: 1
Views: 3355
Reputation: 1182
I think this is what you need:
wget -p -k http://grephysics.net/ans/0177/{1..100}
Explanation:
-k
: rewrites links to point to local assets
-p
: get all images, js, css, etc. needed to display the page
{1..100}
: this specifies a range of urls to download, in your case we have pages labelled 1 to 100.
Why didn't recursive downloading work?
The link you posted was a good first resource, probably what most people would want. But the way wget recursively downloads is by getting the first page specified (i.e. the root), then following links to child pages. The way grephysics is set up however, is that http://grephysics.net/ans/0177 leads us to a 404. It has no links for wget to follow to download child pages.
If your wget doesn't support {}
You can still have the same results by using the following command:
for i in {1..100}; do echo $i; done | wget -p -k -B http://grephysics.net/ans/0177/ -i -
Explanation
for i in {1..100};...
: This prints the values 1 to 100.
|
: For anyone who hasn't seen this, we are piping the output of the previous command into the input of the following command
-p
: get all images, js, css, etc. needed to display the page
-k
: rewrite the links to point to the local copies
-B
: specifies the base URL to use with the -i
option
-i
: reads a list of urls to fetch from a file. Since we specified the 'file' -
it reads from stdin.
So, we read in the values 1 to 100 and append them to our base url
http://grephysics.net/ans/0177/
and fetch all of those urls and all the assets that go with them, then rewrite links so we can browse offline.
Upvotes: 3