web crawler: fetch only useful html content to speed up fetching in php

Question

I am designing a web crawler to fetch a list of products from a site. I have tried simple HTML DOM parser and file_get_contents() to fetch HTML and parse it. But it is taking too much time to fetch the HTML content. Also a lots of parsing overhead as it is a huge size page. I am looking for a way if possible to fetch only required HTML content to speed up fetching. like.. using offset and maxlen parameters in file_get_contents(). but Seeking (offset) is not supported with remote files.

 string file_get_contents ( string $filename,false, 9000, 5000)

Does there any other way to do this?

web crawler: fetch only useful html content to speed up fetching in php

Answers (1)

Related Questions