Reputation: 467
How does crawler or spider in a search engine works
Upvotes: 1
Views: 510
Reputation: 1481
The world wide web is basically a connected directed graph of web documents,images,multimedia files etc. .Each node of the graph is a component of a web page-for example-a web page consists of image,text,video etc, all of them are linked.Crawler traverses the graph using Breadth First Search using links in web pages.
Upvotes: 3
Reputation: 30920
Specifically, you need at least some of the following components:
Crawlers needs to be efficient at working together from different starting points, speed, memory usage and using a high number of threads/processes. I/O is key.
Upvotes: 3
Reputation: 420951
From How Stuff Works
How does any spider start its travels over the Web? The usual starting points are lists of heavily used servers and very popular pages. The spider will begin with a popular site, indexing the words on its pages and following every link found within the site. In this way, the spidering system quickly begins to travel, spreading out across the most widely used portions of the Web.
Upvotes: 0