Reputation: 3659
I tried using httrack to download my phpbb forum, but no matter what setup I use, I cannot get it to stop downloading the entire wikipedia site as well, and many other websites whose links are anywhere in the forum...
What I managed to do it make it download the index page only, but that's not good either.
I thought that setting
+forum.mysite.com/*
in the Options->Scan Rules would do the trick, but it went on to download the entire wikipedia again :(
Upvotes: 7
Views: 11293
Reputation: 1
For gui version. Set exceptions in the filters for all downloaded sites that you don't need, their names can be copied from the download folder. For example:
*.png +*.gif +*.jpg +*.jpeg +*.css +*.js -ad.doubleclick.net/* -mime:application/foobar
+meNeedSite.com/* +forum.mysite.com/*
-meNotNeedSite.com/* -fiu-vro.wikipedia.org/* -fj.wikipedia.org/* -fo.wikipedia.org/* -fonts.googleapis.com/* -fonts.gstatic.com/* -foundation.mozilla.org/* -fr.wikipedia.org/* -frr.wikipedia.org/* -ftp.mozilla.org/* -fur.wikipedia.org/* -fy.wikipedia.org/* -ga.wikipedia.org/* -gd.wikipedia.org/* -gl.wikipedia.org/* -glk.wikipedia.org/* -gmpg.org/* -gn.wikipedia.org/* -ha.wikipedia.org/* -hacks.mozilla.org/* -he.wikipedia.org/* -hi.wikipedia.org/* -hr.wikipedia.org/* -hsb.wikipedia.org/* -hu.wikipedia.org/* -human.spbstu.ru/* -hy.wikipedia.org/* -hyw.wikipedia.org/* -ia.wikipedia.org/* -id.google.com/* -id.wikipedia.org/* -ie.wikipedia.org/* -ilo.wikipedia.org/* -images.ctfassets.net/* -is.wikipedia.org/* -it.wikipedia.org/* -ja.wikipedia.org/* -jv.wikipedia.org/* -ka.wikipedia.org/* -kab.wikipedia.org/* -kk.wikipedia.org/* -kn.wikipedia.org/* -ko.wikipedia.org/* -krc.wikipedia.org/* -ks.wikipedia.org/* -ku.wikipedia.org/* -ky.wikipedia.org/* -la.wikipedia.org/* -labs.mozilla.org/* -lad.wikipedia.org/* -lb.wikipedia.org/* -learning.mozilla.org/* -lez.wikipedia.org/* -lij.wikipedia.org/* -lmo.wikipedia.org/* -ln.wikipedia.org/* -lo.wikipedia.org/* -lt.wikipedia.org/* -lv.wikipedia.org/*
Upvotes: -1
Reputation: 15
Try
Maximum mirroring depth = 1 (Keep this 2, when 1 doesn't work)
And
Maximum external depth = 0 !! Worked for me
Upvotes: 1
Reputation: 476
I would try:
-a
*stay on the same address (--stay-on-same-address)
-d
stay on the same principal domain (--stay-on-same-domain)
Upvotes: 1
Reputation: 3659
Found a questionable solution here: Subject: Re: prevent download of external content.
The problem is that now external links point to a page that looks pretty ugly, which is fixable.
However, embedded content, like youtube, is now also replaced by this ugly page :(
At least it is not downloading the entire internet anymore...
Upvotes: 1