Reputation: 267
I want to extract title , description and keywords of a seris of URLs
I have this code
WebClient x = new WebClient();
string pageSource = (x.DownloadString(url));
query.title = Regex.Match(pageSource, @"\<title\b[^>]*\>\s*(?<Title>[\s\S]*?)\</title\>", RegexOptions.IgnoreCase).Groups["Title"].Value;
But I do not want to download whole page because It is so time consuming for a series of URLs. Is there any way to get get these information without downloading whole page?
I should mention that I get these URLs in google search result page buy sending query to google.
Upvotes: 0
Views: 2350
Reputation: 37938
You can request and download partial result using HttpClient
by specifying range header. You can define the buffer length you want to download and read:
static void Main()
{
Test().GetAwaiter().GetResult();
}
private static async Task Test()
{
const string url = "http://google.com";
const int bytesToRead = 2000;
using (var httpclient = new HttpClient())
{
httpclient.DefaultRequestHeaders.Range = new RangeHeaderValue(0, bytesToRead);
var response = await httpclient.GetAsync(url, HttpCompletionOption.ResponseHeadersRead);
using (var stream = await response.Content.ReadAsStreamAsync())
{
var buffer = new byte[bytesToRead];
stream.Read(buffer, 0, buffer.Length);
var partialHtml = Encoding.UTF8.GetString(buffer);
//extract required info from partial html
}
}
}
Same result could be achieved using "old" WebClient
Upvotes: 3