Max
Max

Reputation: 5063

How to extract text content from tags in .NET?

I'm trying to code a vb.net function to extract specific text content from tags; I wrote this function

Public Function GetTagContent(ByRef instance_handler As String, ByRef start_tag As String, ByRef end_tag As String) As String
    Dim s As String = ""
    Dim content() As String = instance_handler.Split(start_tag)
    If content.Count > 1 Then
        Dim parts() As String = content(1).Split(end_tag)
        If parts.Count > 0 Then
            s = parts(0)
        End If
    End If
    Return s
End Function

But it doesn't work, for example with the following debug code

    Dim testString As String = "<body>my example <div style=""margin-top:20px""> text to extract </div> <br /> another line.</body>"

    txtOutput.Text = testString.GetTagContent("<div style=""margin-top:20px"">", "</div>")

I get only "body>my example" string, instead of "text to extract"

can anyone help me? tnx in advance


I wrote a new routine and the following code works however I would know if exists a better code for performance:

    Dim s As New StringBuilder()
    Dim i As Integer = instance_handler.IndexOf(start_tag, 0)
    If i < 0 Then
        Return ""
    Else
        i = i + start_tag.Length
    End If
    Dim j As Integer = instance_handler.IndexOf(end_tag, i)
    If j < 0 Then
        s.Append(instance_handler.Substring(i))
    Else
        s.Append(instance_handler.Substring(i, j - i))
    End If
    Return s.ToString

Upvotes: 1

Views: 1510

Answers (1)

Steven Doggart
Steven Doggart

Reputation: 43743

XPath is one way of accomplishing this task. I'm sure others will suggest LINQ. Here's an example using XPath:

Dim testString As String = "<body>my example <div style=""margin-top:20px""> text to extract </div> <br /> another line.</body>"
Dim doc As XmlDocument = New XmlDocument()
doc.LoadXml(testString)
MessageBox.Show(doc.SelectSingleNode("/body/div").InnerText)

Obviously, a more complex document may require a more complex xpath than simply "/body/div", but it's still pretty simple.

If you need to get a list of multiple elements that match the path, you can use doc.SelectNodes.

Upvotes: 2

Related Questions