Reputation: 139
I'm trying to go through a web pages source code, add the <img src="http://www.dot.com/image.jpg"
to an HtmlElementCollection
. Then I'm attempting to cycle through each element in the element collection with a foreach loop and download the images through the url.
Here's what I have so far. My problem right now is nothing is downloading, and I don't think my elements are being added properly by tag name. If they are I can't seem to reference them for the download.
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
public void button1_Click(object sender, EventArgs e)
{
string url = urlTextBox.Text;
string sourceCode = WorkerClass.ScreenScrape(url);
StreamWriter sw = new StreamWriter("sourceScraped.html");
sw.Write(sourceCode);
}
private void button2_Click(object sender, EventArgs e)
{
string url = urlTextBox.Text;
WebBrowser browser = new WebBrowser();
browser.Navigate(url);
HtmlElementCollection collection;
List<HtmlElement> imgListString = new List<HtmlElement>();
if (browser != null)
{
if (browser.Document != null)
{
collection = browser.Document.GetElementsByTagName("img");
if (collection != null)
{
foreach (HtmlElement element in collection)
{
WebClient wClient = new WebClient();
string urlDownload = element.FirstChild.GetAttribute("src");
wClient.DownloadFile(urlDownload, urlDownload.Substring(urlDownload.LastIndexOf('/')));
}
}
}
}
}
}
}
Upvotes: 1
Views: 6738
Reputation: 139
To anyone interested, here was the solution. It's exactly what Damith said. I found Html Agility Pack to be rather broken. That was the first thing I tried using. This ended up being a more viable solution for me and this is my final code.
private void button2_Click(object sender, EventArgs e)
{
string url = urlTextBox.Text;
WebBrowser browser = new WebBrowser();
browser.Navigate(url);
browser.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(DownloadFiles);
}
private void DownloadFiles(object sender, WebBrowserDocumentCompletedEventArgs e)
{
HtmlElementCollection collection;
List<HtmlElement> imgListString = new List<HtmlElement>();
if (browser != null)
{
if (browser.Document != null)
{
collection = browser.Document.GetElementsByTagName("img");
if (collection != null)
{
foreach (HtmlElement element in collection)
{
string urlDownload = element.GetAttribute("src");
if (urlDownload != null && urlDownload.Length != 0)
{
WebClient wClient = new WebClient();
wClient.DownloadFile(urlDownload, "C:\\users\\folder\\location\\" + urlDownload.Substring(urlDownload.LastIndexOf('/')));
}
}
}
}
}
}
}
}
Upvotes: 0
Reputation: 63065
Ones you call navigate, you assume document is ready to traverse and check for images. but practically it take some time to load. You need to wait until Document loading Completed.
Add event DocumentCompleted
to your browser object
browser.DocumentCompleted += browser_DocumentCompleted;
implement it as
static void browser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
WebBrowser browser = (WebBrowser)sender;
HtmlElementCollection collection;
List<HtmlElement> imgListString = new List<HtmlElement>();
if (browser != null)
{
if (browser.Document != null)
{
collection = browser.Document.GetElementsByTagName("img");
if (collection != null)
{
foreach (HtmlElement element in collection)
{
WebClient wClient = new WebClient();
string urlDownload = element.GetAttribute("src");
wClient.DownloadFile(urlDownload, urlDownload.Substring(urlDownload.LastIndexOf('/')));
}
}
}
}
}
Upvotes: 3
Reputation: 17724
Take a look at Html Agility Pack.
What you need to do is download and parse the HTML, and then process the elements you are interested in. It is a good tool for such tasks.
Upvotes: 0