Reputation: 149
I am making a project in C#.net in which i have to get the source code of a webpage and identify some specific tags.
For example i have to find all the
<img>
tag in the code. and i have to store it in a variable.
i succeeded in my first step by getting source code of a web page by my c#.net application. I have no ideas how to get a tag and store its position in a variable?
Give me a sugesstion
Upvotes: 0
Views: 1961
Reputation: 4652
I'd recommend to use HtmlAgitityPack for this work, it's very flexible with raw html markup to get tagged content, e.g. :
HtmlDocument htmlDocument = new HtmlDocument();
htmlDocument.LoadHtml("<html><head></head><body><div><img /><div><img /><img/></div></div><img/></body></html>");
var nodes = htmlDocument.DocumentNode.SelectNodes("//img");
// 4 nodes found
foreach (var node in nodes)
{
// do stuff
}
Upvotes: 0
Reputation: 2494
To parse HTML use a dedicated library such as HtmlAgilityPack, but avoid using regular expressions.
Here is an example on extracting links from a snippet of HTML, you can adapt it to get the img tags.
Upvotes: 3