Jorge
Jorge

Reputation: 18237

Need Help Parsing Text Between HTML Tags

Ok, the problem it's that i have string with HTML. I need to find an specific format like this:

<span class="fieldText">some text</span>

of that HTML, I need to extract some text and save it into a list. How can accomplish my goal.

note that the text can appear like this

<p>
    Central: 
<span class="fieldText">Central_Local</span><br>Area Resolutoria:  
<span class="fieldText">Area_Resolutoria</span><br>VPI:  
<span class="fieldText">VIP</span><br>Ciudad: <span class="fieldText">Ciudad</span>   <br>Estado:  <span class="fieldText">Estado</span><br>Region  <span class="fieldText">Region</span>    
</p>

Upvotes: 2

Views: 2489

Answers (5)

Petar Ivanov
Petar Ivanov

Reputation: 93020

You can try regex: @"<span .*?>(.*?)</span>" If you combine it with captures you can get the whole list with @"^(.*?<span .*?>(.*?)</span>.*?)+$".

But the truth is you shouldn't use regex for XML or HTML - there is a plenty of parsers out there, as others have already mentioned.

            string s = @"
<p>
    Central: 
<span class=""fieldText"">Central_Local</span><br>Area Resolutoria:  
<span class=""fieldText"">Area_Resolutoria</span><br>VPI:  
<span class=""fieldText"">VIP</span><br>Ciudad: <span class=""fieldText"">Ciudad</span>   <br>Estado:  <span class=""fieldText"">Estado</span><br>Region  <span class=""fieldText"">Region</span>    
</p>";

            Match m = Regex.Match(s, @"^(.*?<span .*?>(.*?)</span>.*?)+$", RegexOptions.Singleline);

            foreach (var capture in m.Groups[2].Captures)
                Console.WriteLine(capture);

Upvotes: 2

BC.
BC.

Reputation: 24908

Regex has been shown to be a bad solution for parsing HTML. The HTML Agility Pack is exactly what you need for this task.

Upvotes: 0

Chuck Savage
Chuck Savage

Reputation: 11945

Have you tried the HtmlAgilityPack?

Upvotes: 1

user428517
user428517

Reputation: 4193

For small stuff like this I prefer using regular expressions. Not sure what the C# syntax is, but the expression would look something like this:

|<span class="fieldText">(.+)</span>|

Jonathan Wood's suggestion for using an HTML tag parser is a good idea too, especially if you'll be doing a lot of parsing.

Upvotes: 0

Jonathan Wood
Jonathan Wood

Reputation: 67195

I don't like using regular expression for stuff like this.

I've written a free HTML tag parser that you could either use as is, modify to fit your needs, or just use as a guide to how you might approach this on your own.

Upvotes: 2

Related Questions