Or Weinberger
Or Weinberger

Reputation: 7472

Mirroring sites through cURL

Is it possible to 'mirror' a website using cURL?

So basically I have www.mysite.com and www.stackoverflow.com which is the site I would like to mirror.

When I load www.mysite.com I want it to call a cURL function that downloads www.stackoverflow.com homepage and display it to the user, but before it does, I need to have some sort of a regex to edit all the links (also css/js links) to something like www.mysite.com/?page=/questions

I know that things like the search, and of course the 'ask question' features will not work, but the general browsing of the site should be fine, right?

How would you go about doing something like that?

Thanks,

Upvotes: 2

Views: 14802

Answers (3)

Kyle Mathews
Kyle Mathews

Reputation: 3278

wget is very nice for this task.

Just run from your command line:

wget -mkx -e robots=off http://the-site-you-want-to-mirror.com

And it'll download all the pages, images, stylesheets, js files etc to a local directory and rewrite all the links so they work locally.

If it's not your own server, be nice and add -w 2 to add a 2 second delay between page requests.

Upvotes: 7

sarnold
sarnold

Reputation: 104080

Apache's mod_proxy may help you do what you want: deploy an Apache system with mod_proxy and mod_proxy_html to rewrite links: http://www.apachetutor.org/admin/reverseproxies

But please oh please don't make yet another worthless content scraping site -- use this for good, not evil. :)

Upvotes: 0

helle
helle

Reputation: 11650

well you better do a redirect.

or if you want to have your url shown in the browser use frames...

UPDATE:

but if you wan't to change the html load the curl-answer into a div. you can parse the answer before. with php i.e. str_replace("www.stackoverflow.com", "www.mysite.com", $curl_answer);

Upvotes: 1

Related Questions