Milton Cardoso
Milton Cardoso

Reputation: 358

VB.NET Regex Match on XML data

I'm trying to get a match against XML data as string for a specific id and a name from a listbox.

Private Sub Button2_Click(sender As Object, e As EventArgs) Handles  Button2.Click
    'website
    Dim link As String = "https://s25-pt.ogame.gameforge.com/api/players.xml"

    Dim html As String
    'name selected on listbox
    Dim jogador As String = ListBox1.Text
    Dim pattern As String = "player id=""(.*?)"" name=""" & jogador & """"


    webc1 = New WebClient
    webc1.Headers.Add("user-agent", "Mozilla/5.0 (Windows; U; Windows NT 5.0; es-ES; rv:1.8.0.3) Gecko/20060426 Firefox/1.5.0.3")

    html = webc1.DownloadString(link)


    Dim match As Match = Regex.Match(html, pattern)

    If match.Success Then
        MsgBox(match.Groups(1).Value)
    End If
End Sub

I'm not getting just the id but also I get a big piece of the 'html' string.

I tried to look for answer's on google, I tried other patterns but i don't get how to solve this problem. Is there a way I can improve my regex ?

I know this is xml, and I probably could get it using other method more appropriate, but i find this way easier.

Upvotes: 1

Views: 715

Answers (2)

MrGadget
MrGadget

Reputation: 1268

I just couldn't resist this since RegEx against XML is just not a good idea.

Your link to the sample XML was kind enough to offer up a schema:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
    <xs:element name="players">
        <xs:complexType>
            <xs:sequence>
                <xs:element name="player" maxOccurs="unbounded">
                    <xs:complexType>
                        <xs:attribute name="id" use="required" type="xs:integer"/>
                        <xs:attribute name="name" use="required" type="xs:string"/>
                        <xs:attribute name="status" use="optional">
                            <xs:simpleType>
                                <xs:restriction base="xs:string">
                                    <xs:pattern value="(a|[vIibo]+)"/>
                                </xs:restriction>
                            </xs:simpleType>
                        </xs:attribute>
                        <xs:attribute name="alliance" type="xs:string"/>
                    </xs:complexType>
                </xs:element>
            </xs:sequence>
            <xs:attribute name="timestamp" type="xs:integer"/>
            <xs:attribute name="serverId" type="xs:string"/>
        </xs:complexType>
    </xs:element>
</xs:schema>

This produces the following two classes (we don't care about the restriction in this case):

Imports System.Net
Imports System.IO
Imports System.Text
Imports System.Collections.Specialized
Imports System.Xml.Serialization
Imports System.Diagnostics
Imports System.Collections.Generic
Imports System.Linq

<XmlType(AnonymousType:=True, TypeName:="players"), XmlRoot(ElementName:="players")>
Public Class PlayerList
    <XmlElement("player", Form:=XmlSchemaForm.Unqualified, ElementName:="player")>
    Public Property Players() As New List(Of Player)

    <XmlAttribute(AttributeName:="timestamp"), DefaultValue(0)>
    Public Property Timestamp() As Integer

    <XmlAttribute(AttributeName:="serverId"), DefaultValue("")>
    Public Property ServerId() As String

    Public Function Find(PlayerName As String) As Player
        Return Players.FirstOrDefault(Function(p) p.Name = PlayerName)
    End Function
End Class

<XmlType(AnonymousType:=True, TypeName:="player"), XmlRoot("player")>
Public Class Player
    <XmlAttribute(AttributeName:="id"), DefaultValue(0)>
    Public Property Id() As Integer

    <XmlAttribute(AttributeName:="name"), DefaultValue("")>
    Public Property Name() As String

    <XmlAttribute(AttributeName:="status"), DefaultValue("")>
    Public Property Status() As String

    <XmlAttribute(AttributeName:="alliance"), DefaultValue("")>
    Public Property Alliance() As String
End Class

I've added a Find function in the PlayerList class for your button handler to call:

Private Sub Button2_Click(sender As Object, e As EventArgs) Handles  Button2.Click
    Dim Link As String = "https://s25-pt.ogame.gameforge.com/api/players.xml"
    Dim MyPlayers As PlayerList = Nothing

    With New WebClient
        .Headers.Add("user-agent", "Mozilla/5.0 (Windows; U; Windows NT 5.0; es-ES; rv:1.8.0.3) Gecko/20060426 Firefox/1.5.0.3")
        MyPlayers = Deserialize(.DownloadString(Link), GetType(PlayerList))
        .Dispose()
    End With

    Dim MyPlayer As Player = MyPlayers.Find(ListBox1.Text)
    If MyPlayer IsNot Nothing Then
        Debug.Print("Player ID: {0}", MyPlayer.Id)
        Debug.Print("Player Name: {0}", MyPlayer.Name)
        Debug.Print("Player Status: {0}", MyPlayer.Status)
        Debug.Print("Player Alliance: {0}", MyPlayer.Alliance)
    Else
        Debug.Print("Not Found")
    End If
End Sub

Private Function Deserialize(XMLString As String, ObjectType As Type) As Object
    Return New XmlSerializer(ObjectType).Deserialize(New MemoryStream(Encoding.UTF8.GetBytes(XMLString)))
End Function

Testing with Fantasma2 I get the following output:

Player ID: 100110
Player Name: Fantasma2
Player Status: vI
Player Alliance: 4762

Upvotes: 1

Robin Mackenzie
Robin Mackenzie

Reputation: 19299

If you try your regex on regex101 then it works fine e.g. running in pcre/ php mode. However, .NET regexes work a little differently from other implementations.

So, I tried with this regex instead and got a proper match:

player id="(\d+)" name="sniper lord"

Giving me a result of 1000042 from your data.

\d+ just means one or more digits - your XML data indicates the player IDs are numeric only so this 'tightens up' the regex. This also uses sniper lord as a test value for jogador.

Perhaps you can also use the String.Format command to help out with the slightly confusing run of double quotes:

Dim pattern As String = String.Format("player id=""{0}"" name=""{1}""", "(\d+)", jogador)

Upvotes: 1

Related Questions