Reputation: 33
I need to pull data from roughly 6000 pages of a website. After doing some research, I decided to give WinHTTP a shot. I was able to get this working, however I was doing things synchronously, so it took a while to complete. I am now attempting to use WinHTTP asynchronously, but I've hit a roadblock. I searched around for a number of tutorials and examples, but I could only find the MSDN documentation, which seems overly complex for what I'm doing. As mentioned, I couldn't find many resources, so I went ahead and gave it a shot:
std::string theSource = "";
char * httpBuffer;
DWORD dwSize = 1;
DWORD dwRecv = 1;
HINTERNET hOpen =
WinHttpOpen
(
L"Example Agent",
WINHTTP_ACCESS_TYPE_NO_PROXY,
NULL,
NULL,
WINHTTP_FLAG_ASYNC
);
WINHTTP_STATUS_CALLBACK theCallback =
WinHttpSetStatusCallback
(
hOpen,
(WINHTTP_STATUS_CALLBACK) HttpCallback,
WINHTTP_CALLBACK_FLAG_ALL_NOTIFICATIONS,
NULL
);
HINTERNET hConnect =
WinHttpConnect
(
hOpen,
L"example.org",
INTERNET_DEFAULT_HTTPS_PORT,
0
);
HINTERNET hRequest = NULL;
BOOL allComplete = false;
int theRequest = 1;
while (!allComplete)
{
if (theRequest == 1)
{
hRequest = WinHttpOpenRequest
(
hConnect,
L"GET",
L"example.html",
0,
WINHTTP_NO_REFERER,
WINHTTP_DEFAULT_ACCEPT_TYPES,
WINHTTP_FLAG_SECURE
);
WinHttpSendRequest
(
hRequest,
WINHTTP_NO_ADDITIONAL_HEADERS,
0,
WINHTTP_NO_REQUEST_DATA,
0,
0,
0
);
}
else if (theRequest == 2)
{
WinHttpReceiveResponse(hRequest, NULL);
}
else if (theRequest == 3)
{
WinHttpQueryHeaders
(
hRequest,
WINHTTP_QUERY_RAW_HEADERS_CRLF,
WINHTTP_HEADER_NAME_BY_INDEX,
NULL,
&dwSize,
WINHTTP_NO_HEADER_INDEX
);
WCHAR * headerBuffer = new WCHAR[dwSize/sizeof(WCHAR)];
WinHttpQueryHeaders
(
hRequest,
WINHTTP_QUERY_RAW_HEADERS_CRLF,
WINHTTP_HEADER_NAME_BY_INDEX,
headerBuffer,
&dwSize,
WINHTTP_NO_HEADER_INDEX
);
delete [] headerBuffer;
dwSize = 1;
while (dwSize > 0)
{
if (!WinHttpQueryDataAvailable(hRequest, &dwSize))
{
break;
}
httpBuffer = new char[dwSize + 1];
ZeroMemory(httpBuffer, dwSize + 1);
if (!WinHttpReadData(hRequest, httpBuffer, dwSize, &dwRecv))
{
std::cout << "WinHttpReadData() - Error Code: " << GetLastError() << "\n";
}
else
{
theSource = theSource + httpBuffer;
}
delete [] httpBuffer;
// Parse the source for the data I'm looking for.
break;
}
}
Below is my callback function:
void CALLBACK HttpCallback(HINTERNET hInternet, DWORD * dwContext, DWORD dwInternetStatus, void * lpvStatusInfo, DWORD dwStatusInfoLength)
{
switch (dwInternetStatus)
{
default:
std::cout << dwInternetStatus << "\n";
break;
case WINHTTP_CALLBACK_STATUS_HANDLE_CREATED:
std::cout << "Handle created.\n";
theRequest = 1;
break;
case WINHTTP_CALLBACK_STATUS_REQUEST_SENT:
std::cout << "Request sent.\n";
theRequest = 2;
break;
case WINHTTP_CALLBACK_STATUS_RESPONSE_RECEIVED:
std::cout << "Response received.\n";
theRequest = 3;
break;
}
}
Note: I've only provided this section of my code since it's the part that pertains to my question/problem. I apologize if a variable declaration is missing.
The above code works for me and does in fact get the desired information I'm looking for, but only for a single page. After getting to this point, I realized I didn't have any idea about what to do when it came to making multiple requests with this method. Again, searching didn't turn up with much besides the MSDN articles, which as far as I can tell, aren't examples that make multiple requests at once. Additionally, the while loop I'm using to open/send/etc. the requests based on theRequest's value seems like a terrible way of doing this. I'd appreciate any other advice to improve my code as well.
In general, here's a summary of my problem: I need to make about 6000 GET requests using WinHTTP asynchronously. I'm not entirely sure how to do this because I'm new to WinHTTP, so I'm looking for the most basic (or possibly efficient) way to work with multiple asynchronous requests.
Upvotes: 3
Views: 4483
Reputation: 69662
You are to repeat what you are doing in while (!allComplete) { ... }
and shoot more requests this way. You can reuse hConnect
but you need to do WinHttpOpenRequest
for every resource request.
Upvotes: 2