String.Split returning incorrect array

Attempting to correct an HTML table that is incorrectly formatted. I do not have control over the source, my application just loads the contents of a downloaded file as a regular text file. The file contents are a simple HTML table that is missing the closing </tr> elements. I'm attempting to split the contents on <tr> to get an array to which I can a </tr> to the end of the elements that need it. When I attempt to split the string using fleContents.Split("<tr>").ToList I'm getting a lot more elements in the resulting List(Of String) than there should be.

Here I a short little test code that shows the same behavior:

Dim testSource As String = "<table><tr><td>8172745</td><tr><td>8172745</td></table>"
Dim testArr As String() = testSource.Split("<tr>")

'Maybe try splitting on a variable because you can't use a string literal containging "<>" in the Split method
Dim seper as String = "<tr>"
testArr As String() = testSource.Split(seper)

'feed it a new string directly
testArr = testSource .Split(New String("<tr>"))

I would expect that testArr should contain 3 elements, as follows:

  1. "<table>"
  2. "<td>8172745</td>"
  3. "<td>8172745</td></table>"

However, I am receiving the following array:

  1. ""
  2. "table>"
  3. "tr>"
  4. "td>8172745"
  5. "/td>"
  6. "tr>"
  7. "td>8172954"
  8. "/td>"
  9. "/table>"

Can someone please explain why the strings are being split the way they are and how I can go about getting the results I'm expecting?

Upvotes: 1

Views: 297

Answers (2)

Justin Niessner
Justin Niessner

Reputation: 245419

Your code is using a different overload of the Split method than you're expecting. You want the method that takes a String[] and StringSplitOptions parameter:

Dim testSource As String = "<table><tr><td>8172745</td><tr><td>8172745</td></table>"
Dim delimeter As String() = { "<tr>" }
Dim testArr As String() = _
    testSource.Split(delimeter, StringSplitOptions.RemoveEmptyEntries)

You can see it working at IDEOne:

http://ideone.com/pcw6aq

Upvotes: 2

Mohammad Farah
Mohammad Farah

Reputation: 11

Try to use Regex like that

Imports System.Text.RegularExpressions

Public Class Form1


    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
        Dim testSource As String = "<table><tr><td>8172745</td><tr><td>8172745</td></table>"
        Dim testArr As String() = Regex.Split(testSource, "<tr>")

        'Show The Array in TextBox1
        TextBox1.Lines = testArr

    End Sub
End Class

All The Best

Upvotes: 1

Related Questions