Adam
Adam

Reputation: 6122

Remove all div elements from string using vb.net

I want to remove all elements, including the ones with attributes like class, from my string. I already checked here, so regex is apparently not the answer: RegEx match open tags except XHTML self-contained tags

I currently already have something with regex that replaces all tags from a string (note, I'm never parsing a full HTML document if that matters) and preserves the content: Regex.Replace(s, "<[^>]*(>|$)", String.Empty). However, I just want the div tags removed and preserve the content.

So I have:

<div class=""fade-content""><div><span>some  content</span></div></div>
<div>some  content</div> 

Desired output:

<span>some  content</span>
some  content

I was going the regex path stil, and trying something like: <div>.*<\/div>, but that excludes divs with attributes.

How can I remove div elements only, using VB.NET?

Upvotes: 0

Views: 379

Answers (2)

It all makes cents
It all makes cents

Reputation: 4983

This can be achieved without regular expressions by using a WebBrowser control. Try the following:

ExtractDesiredData:

Private Function ExtractDesiredData(html As String) As List(Of String)
    Dim result As List(Of String) = New List(Of String)()

    'create new instance
    Using wb As WebBrowser = New WebBrowser()
        wb.Navigate(New Uri("about:blank"))

        'create reference
        Dim doc As HtmlDocument = wb.Document

        'add html to document
        doc.Write(html)

        'loop through body elements
        For Each elem As HtmlElement In doc.Body.All
            If elem.TagName = "DIV" AndAlso Not elem.InnerHtml.Contains("DIV") Then
                Debug.WriteLine($"DIV elem InnerHtml: '{elem.InnerHtml}'")

                'add
                result.Add(elem.InnerHtml)
            End If
        Next
    End Using

    Return result
End Function

Usage:

Dim html As String = "<div class=""fade-content""><div><span>some  content</span></div></div>"
html &= vbCrLf & "<div>some  content</div>"

Dim desiredData As List(Of String) = ExtractDesiredData(html)

Resources:

Upvotes: 0

Calaf
Calaf

Reputation: 1173

There are several ways to do this. One, short and simple, is the following one:

Regex.Replace(s, "</?div.*?>", String.Empty)

Here is an example:

    's simulates your html file
    Dim s As String = "<div class="""" fade-content""""><div><span>some  content</span></div></div>" + Environment.NewLine + "<div>some  content</div>"

    'let's store the result in s1
    Dim s1 As String = Text.RegularExpressions.Regex.Replace(s, "</?div.*?>", String.Empty)

    'output
    MessageBox.Show(s1)

Output:

enter image description here

Upvotes: 3

Related Questions