Reputation: 75
I'm trying to get a specific div from a textfile filled with div's. I'm using streamreader to get into the file, but I don't know how to get the complete div. After getting the div I'm gonna turn each line into a string, which will be added to a list. The textfile is as follows:
<div id="#SMINLANGUAGE1 ">
English
Hello.
This is a Test
Test 23
</div>
<div id="#SMINLANGUAGE2 ">
Dutch
Hallo.
Dit is een Test
Test 29
</div>
<div id="#SMINLANGUAGE3 ">
Spanish
Hola.
Esto es una Prueba.
Prueba 86
</div>
List for English would be:
Index 0: English
Index 1: Hello.
Index 2: This is a Test
Index 3: Test23
Upvotes: 0
Views: 634
Reputation: 19156
First you need to install HtmlAgilityPack
to parse HTML:
Install-Package HtmlAgilityPack
Then by selecting //div
path, we can extract all of the available DIVs form the HTML content:
var doc = new HtmlDocument
{
OptionOutputAsXml = true,
OptionCheckSyntax = true,
OptionFixNestedTags = true,
OptionAutoCloseOnEnd = true,
OptionDefaultStreamEncoding = Encoding.UTF8
};
doc.LoadHtml(htmlContent);
var results = new List<string[]>();
foreach (var node in doc.DocumentNode.SelectNodes("//div"))
{
var divContent = node.InnerText;
if (string.IsNullOrWhiteSpace(divContent))
continue;
var lines = divContent.Trim().Split(new[] { '\n' }, StringSplitOptions.RemoveEmptyEntries);
results.Add(lines);
}
Upvotes: 1