Reputation: 5114
i want to read the content of a website and store it in a file by using c# and asp.net. I know we can read it by using httpwebrequest. But is it possible to read the all available links data also?
Ex: suppose i want to read http://www.msn.com i can directly give the url and can read the home page data that is no issue. But here that msn.com page contains so many links in the home page I want to read those pages content also. Is it possible?
Can somebody give me a starup to do this?
Thanks in advance
Upvotes: 0
Views: 223
Reputation: 13706
define queue of urls
add main page url to queue
while queue is not empy
3.1 currentUrl = Dequeue()
3.2 read current url
3.3 exctarct all urls from current page using regexp.
3.4 add all urls to the queue
You will have to limit the urls in queue to some sort of depth or to some domain, otherwise you will try to download the entire internet :)
Upvotes: 1