Reputation: 189
I'm using HttpClient.GetStringAsync to make a web scraping app, and the method will be called on over 5000 web pages during a single run. The pages themselves are pretty simple, and all I'm doing is parsing the source to extract some strings, but I'm curious about what sort of a hit this will have on my web bandwidth.
Does GetStringAsync
only download the source, or will the embedded resources such as images, google maps controls, scripts, etc... have to actually load/run too at some point while the method's doing it's thing?
Here's an example of what the chrome network monitor tool shows when I load one of the pages I'm interested in:
All I care about is the circled figure at the top, since that's my source code. Is that all the GetStringAsync
method will need to download, or will other more data-intensive stuff happen in the background too and bloat this figure?
Upvotes: 1
Views: 211
Reputation: 46
HttpClient is not a web browser (engine), and it does not behave like a web browser.
HttpClient.GetStringAsync will only receive (download) the response/resource from the server. HttpClient is not going to analyze and parse what that resource might be. From the perspective of HttpClient.GetStringAsync it's just trying to obtain a response from the server whose payload is supposed to be text data (a string). That's it. Whether the response is some HTML text, some json text, or just some letter soup is of no concern to HttpClient, because HttpClient is not a web browser engine.
In summary:
Does GetStringAsync only download the source, or will the embedded resources such as images, google maps controls, scripts, etc...
It only downloads the source, because HttpClient is not a web browser engine.
Upvotes: 3