Simos Sigma
Simos Sigma

Reputation: 978

Grab specific part of text from a local html file and use it as variable

I am making a small "home" application using VB. As the title says, I want to grab a part of text from a local html file and use it as variable, or put it in a textbox.

I have tried something like this...

Private Sub Open_Button_Click(sender As Object, e As EventArgs) Handles Open_Button.Click
    Dim openFileDialog As New OpenFileDialog()
    openFileDialog.CheckFileExists = True
    openFileDialog.CheckPathExists = True
    openFileDialog.FileName = ""
    openFileDialog.Filter = "All|*.*"
    openFileDialog.Multiselect = False
    openFileDialog.Title = "Open"

    If openFileDialog.ShowDialog = Windows.Forms.DialogResult.OK Then
        Dim fileReader As String = My.Computer.FileSystem.ReadAllText(openFileDialog1.FileName)
            TextBox.Text = fileReader
    End If
End Sub

The result is to load the whole html code inside this textbox. What should I do so to grab a specific part of html files's code? Let's say I want to grab only the word text from this span...<span id="something">This is a text!!!</a>

Upvotes: 0

Views: 1019

Answers (2)

Visual Vincent
Visual Vincent

Reputation: 18310

Using an HTML parser is highly recommended due to the HTML language's many nested tags (see this question for example).

However, finding the contents of a single tag using Regex is possible with no bigger problems if the HTML is formatted correctly.

This would be what you need (the function is case-insensitive):

Public Function FindTextInSpan(ByVal HTML As String, ByVal SpanId As String, ByVal LookFor As String) As String
    Dim m As Match = Regex.Match(HTML, "(?<=<span.+id=""" & SpanId & """.*>.*)" & LookFor & "(?=.*<\/span>)", RegexOptions.IgnoreCase)
    Return If(m IsNot Nothing, m.Value, "")
End Function

The parameters of the function are:

HTML: The HTML code as string.

SpanId: The id of the span (ex. <span id="hello"> - hello is the id)

LookFor: What text to look for inside the span.

Online test: http://ideone.com/luGw1V

Upvotes: 1

stormCloud
stormCloud

Reputation: 993

I make the following assumptions on this answer.

  1. Your html is valid - i.e. the id is completely unique in the document.
  2. You will always have an id on your html tag
  3. You'll always be using the same tag (e.g. span)

I'd do something like this:

' get the html document

 Dim fileReader As String = My.Computer.FileSystem.ReadAllText(openFileDialog1.FileName)

' split the html text based on the span element

Dim fileSplit as string() = fileReader.Split(New String () {"<span id=""something"">"}, StringSplitOptions.None)

' get the last part of the text

fileReader = fileSplit.last

' we now need to trim everything after the close tag

fileSplit = fileReader.Split(New String () {"</span>"}, StringSplitOptions.None)

' get the first part of the text 

fileReader = fileSplit.first

' the fileReader variable should now contain the contents of the span tag with id "something"

Note: this code is untested and I've typed it on the stack exchange mobile app, so there might be some auto correct typos in it.

You might want to add in some error validation such as making sure that the span element only occurs once, etc.

Upvotes: 1

Related Questions