dnyaneshwari
dnyaneshwari

Reputation:

Problems with home-brew web crawler

I have built a webcrawler in C++. I am using an API called URLdownloadToFile().

  1. Is there any other API that can be used?
  2. The API URLdownloadToFile() is working well for some URLs and it is not working well for some other URLs? Please suggest some ways I can overcome this problem?

Thanks, Dnyaneshwari C.

Upvotes: 1

Views: 410

Answers (3)

David Sykes
David Sykes

Reputation: 49882

Unless there is a particular reason for sticking with c++ you could well be better off switching to Python and using BeautifulSoup. I've used curl, and it is nice, but all my web stuff is done in Python now

Upvotes: 0

shoosh
shoosh

Reputation: 79013

You might want to look at WinINet which is a simple C API for high-level interface with the HTTP network stack. Another option is WinHttp which is somewhat more compilcated and requires you do deal with COM.

Upvotes: 0

Henrik Hartz
Henrik Hartz

Reputation: 3675

You might want to look into libcurl which should allow you to pull content using a variety of protocols. This should also support proxies etc which might be what is giving you problems with specific urls. See also; http://curl.haxx.se/

Upvotes: 2

Related Questions