irl_irl
irl_irl

Reputation: 3975

How do you download a website?

Search engine bots crawl the web and download each page they go to for analysis, right?

How exactly do they download a page? What way do they store the pages?

I am asking because I want to run an analysis on a few webpages. I could scrape the page by going to the address but wouldn't it make more sense to download the pages to my computer and work on them from there?

Upvotes: 3

Views: 337

Answers (3)

athspk
athspk

Reputation: 6762

Try HTTrack

About the way they do it:
The indexing starts from a designated starting point (an entrance if you prefer). From there, the spider follows recursively all hyperlinks until a given depth.

Search engine spiders work like this as well, but there are many crawling simultaneously and there are other factors that count. For example a newly created post here in SO will be picked up by google very fast, but an update at a low traffic web site will be picked up even days later.

Upvotes: 7

Paul Tomblin
Paul Tomblin

Reputation: 182782

wget --mirror

Upvotes: 8

Siriss
Siriss

Reputation: 3767

You can use the debugging tools built into Firefox (or firebug) and Chrome to examine how the page works. As far as downloading them directly, I am not sure. You could maybe try viewing the page source in your browser, and then copy and paste the code.

Upvotes: 2

Related Questions