paulthepaddy
paulthepaddy

Reputation: 21

Regex getting everything between tags over multiple line VB.Net

Here is a page with a lot of stuff on it but it has 50 blocks of the blocks I have posted below.

HTML Block

<li>
    <dl>
        <dd>

        <a href="/wow/en/item/113987" class="color-q4" data-item="pl=100&amp;cc=5&amp;bl=566">




    <span  class="icon-frame frame-18 " style='background-image: url("http://media.blizzard.com/wow/icons/18/inv_misc_trinket6oih_lanternb1.jpg");'>
    </span>
</a>Obtained <a href="/wow/en/item/113987" class="color-q4" data-item="pl=100&amp;cc=5&amp;bl=566">Battering Talisman</a>.


</dd>
        <dt>22 hours ago</dt>
    </dl>
    </li>

The code I'm using now only searches for this line

Obtained <a href="/wow/en/item/113987" class="color-q4" data-item="pl=100&amp;cc=5&amp;bl=566">Battering Talisman</a>.

How can I get my MatchCollection to return the full HTML block as 1 match?

Dim explorer As New WowExplorer(WowDotNetAPI.Region.EU, Locale.en_GB, "apikey")
    Dim Request As System.Net.HttpWebRequest = System.Net.HttpWebRequest.Create("http://eu.battle.net/wow/en/character/" & Me.Realm & "/" & Me.Name & "/feed")
    Dim Response As System.Net.HttpWebResponse = Request.GetResponse
    Dim sr As System.IO.StreamReader = New System.IO.StreamReader(Response.GetResponseStream())
    Dim Sourecode As String = sr.ReadToEnd

    Dim Item_ As New System.Text.RegularExpressions.Regex( _
    "Obtained <a href=""/wow/en/item/.*"" class=""color-q4"".*")

    Dim matche_name As MatchCollection = Item_.Matches(Sourecode)
    For Each Match As Match In matche_name
        Dim ItemID As String
        Dim ID_Match As String = Match.Value.Split("/").GetValue(4)
        ItemID = ID_Match.Split("""").GetValue(0)
        Me.Items.Add(explorer.GetItem(ItemID, ItemSource))
    Next

Upvotes: 2

Views: 503

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626871

Here is a sample code showing how to get those strings using XDocument and Xpath and regex (I added a second <li> to emulate HTML you might have):

Dim dds As List(Of String), dts As List(Of String)
dds = New List(Of String)
dts = New List(Of String)
Dim str As String = "<li> <dl>         <dd>            <a href=""/wow/en/item/113987"" class=""color-q4"" data-item=""pl=100&amp;cc=5&amp;bl=566"">                <span class=""icon-frame frame-18 "" style='background-image: url(""http://media.blizzard.com/wow/icons/18/inv_misc_trinket6oih_lanternb1.jpg"");'>                </span>            </a>Obtained <a href=""/wow/en/item/113987"" class=""color-q4"" data-item=""pl=100&amp;cc=5&amp;bl=566"">Battering Talisman</a>.</dd>       <dt>22 hours ago</dt>    </dl>    </li>"
str += "<li> <dl>         <dd>            <a href=""/wow/en/item/113987"" class=""color-q4"" data-item=""pl=100&amp;cc=5&amp;bl=566"">                <span class=""icon-frame frame-18 "" style='background-image: url(""http://media.blizzard.com/wow/icons/18/inv_misc_trinket6oih_lanternb1.jpg"");'>                </span>            </a>Obtained <a href=""/wow/en/item/113987"" class=""color-q4"" data-item=""pl=100&amp;cc=5&amp;bl=566"">New Talisman</a>.</dd>       <dt>10 hours ago</dt>    </dl>    </li>"
' XPATH WAY
Dim xDoc As XDocument = XDocument.Parse("<?xml version= '1.0'?><root>" + str + "</root>")
dds = xDoc.XPathSelectElements("//dd").Select(Function(m) m.Value).ToList()
dts = xDoc.XPathSelectElements("//dt").Select(Function(m) m.Value).ToList()

' REGEX WAY
dds = New List(Of String)
dts = New List(Of String)
Dim rx As Regex = New Regex("(?s)</a>([^<]*?)<a\s[^>]*?>([^<]*?)</a>([^<\r\n]*)")
Dim matches As IEnumerable(Of Match) = rx.Matches(str).Cast(Of Match)().Select(Function(m) m)
dds = (From match In matches
       Select match.Groups(1).Value + match.Groups(2).Value + match.Groups(3).Value).ToList()
Dim rxDt As Regex = New Regex("(?s)<dt>\s*([^<]*?)\s*</dt>")
Dim matches_dts As IEnumerable(Of Match) = rxDt.Matches(str).Cast(Of Match)().Select(Function(m) m)
dts = (From match In matches_dts
       Select match.Groups(1).Value).ToList()

Results:

enter image description here

Upvotes: 1

Related Questions