Reputation:
I have built a webcrawler in C++. I am using an API called URLdownloadToFile()
.
URLdownloadToFile()
is working well for some URLs and it is not working well for some other URLs? Please suggest some ways I can overcome this problem?Thanks, Dnyaneshwari C.
Upvotes: 1
Views: 410
Reputation: 49882
Unless there is a particular reason for sticking with c++ you could well be better off switching to Python and using BeautifulSoup. I've used curl, and it is nice, but all my web stuff is done in Python now
Upvotes: 0
Reputation: 79013
You might want to look at WinINet which is a simple C API for high-level interface with the HTTP network stack. Another option is WinHttp which is somewhat more compilcated and requires you do deal with COM.
Upvotes: 0
Reputation: 3675
You might want to look into libcurl which should allow you to pull content using a variety of protocols. This should also support proxies etc which might be what is giving you problems with specific urls. See also; http://curl.haxx.se/
Upvotes: 2