Reputation: 403
for (int i = 0; i < numberoflinks; i++)
{
string downloadString = client.DownloadString(mainlink+i+".html");
var document = new HtmlWeb().Load(url);
var urls = document.DocumentNode.Descendants("img")
.Select(e => e.GetAttributeValue("src", null))
.Where(s => !String.IsNullOrEmpty(s))
}
The problem is that HtmlWeb().Load require a html url but i want to Load the string downloadString which have already the html content inside.
Update:
I tried this now:
for (int i = 0; i < numberoflinks; i++)
{
string downloadString = client.DownloadString(mainlink+i+".html");
HtmlAgilityPack.HtmlDocument document = new HtmlAgilityPack.HtmlDocument();
document.Load(downloadString);
var urls = document.DocumentNode.Descendants("img")
.Select(e => e.GetAttributeValue("src", null))
.Where(s => !String.IsNullOrEmpty(s));
}
But i'm getting exception on the line:
document.Load(downloadString);
Illegal characters in path
What i'm trying to do is to download/extract all .JPG images from each link. Without download the url first to the hard disk but download the content to a string extract all images links ending with .JPG in this html then download the JPG's.
Upvotes: 1
Views: 306
Reputation: 6013
You should be able to process a string of HTML using the LoadHtml()
method of HtmlDocument
.
From the source code:
public void LoadHtml(string html)
Loads the HTML document from the specified string.
param name="html"
String containing the HTML document to load. May not be null.
The Load
method expects a filename, which the is reason for the message about illegal characters in path
.
Upvotes: 2