mightymax
mightymax

Reputation: 431

Regular expression is treating group as a string

I have a regular expression that uses the matched value from another REGEX in it. But when I test the regular expression it's not capturing the second regex group. Instead it's treating the group as a string. How would I get this regex to output the group?

Private Sub CreateGraphicsFunction(sender As Object, e As EventArgs)
    Dim Regex = New Regex("infoEntityIdent=""(ICN.+?)[""].*?[>]")

    Dim ICNFiles = Directory.EnumerateFiles(MoveToPath, "*.*", SearchOption.AllDirectories)

    For Each tFile In ICNFiles
        Dim input = File.ReadAllText(tFile)

        Dim match = Regex.Match(input)
        If match.Success Then
            GraphicList.Add(match.Groups(1).Value)
            Dim Regex2 = New Regex("<!ENTITY " & match.Groups(1).Value & "  SYSTEM ""(ICN.+?[.]\w.+?)[""]")
            Debug.Write(Regex2)    ' outputs !ENTITY ICN-GAASIB0-00-051105-A-0YJB5-00005-A-001-01  SYSTEM "(ICN.+?[.]\w.+)["]
            Dim sysFileMatch = Regex2.Match(input)

            If sysFileMatch.Success Then
                ICNList.Add(sysFileMatch.Groups(1).Value)
                Debug.Write("found ICN " & sysFileMatch.Groups(1).Value)
            End If
        End If
    Next
End Sub

Examples the first Regex captures the ICN number. E.g Using this regex captures the ICN number.

New Regex("infoEntityIdent=""(ICN.+?)[""].*?[>]")

From there I want to use the value captured in the group to go through the file again and find the matching ICN with ext. E.g So I use the captured group from the first regex in the new regex to get the ICN number with extension.

New Regex("<!ENTITY " & match.Groups(1).Value & "  SYSTEM ""(ICN.+?[.]\w.+?)[""]")

When I test this Regex out put it gives me

!ENTITY ICN-GAASIB0-00-051105-A-0YJB5-00005-A-001-01  SYSTEM "(ICN.+?[.]\w.+)["]

It's ignoring the second Regex grouping and instead treating it like part of the string instead of being used as a group. What I want is the ICN number with extension after SYSTEM

Lastest Code sample to try to get it to work

    Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click

    Dim Files = Directory.EnumerateFiles(MovePath, "*.*", SearchOption.AllDirectories)

    For Each tFile In Files
        Dim input = File.ReadAllText(tFile)
        Dim strREGEX = New Regex("(?=[\S\s]*?infoEntityIdent\s*=\s*""\s*(ICN[\S\s]+?)\s*""[\S\s]*?>)[\S\s]*?<!ENTITY\s+\1\s+SYSTEM\s+""\s*(ICN[\S\s]+?\.\w[\S\s]+?)\s*")
        Dim match = strREGEX.Match(tFile)
        If match.Success Then
            Debug.Write(match.Groups(2).Value)
        Else
            Debug.Write(match.Groups(2).Value & " was not found")
        End If
    Next
End Sub

Upvotes: 0

Views: 113

Answers (2)

user557597
user557597

Reputation:

Combine both regex into a single regex.
This avoids the hassle of human intervention error.

This is both your actual regex combined into a single regex.
I've adjusted it so it's a good regex now.
If it doesn't match, I have no way of checking it, you've never
posted a target string.

Raw: (?=[\S\s]*?infoEntityIdent\s*=\s*"\s*(ICN[\S\s]+?)\s*"[\S\s]*?>)[\S\s]*?<!ENTITY\s+\1\s+SYSTEM\s+"\s*(ICN[\S\s]+?\.\w[\S\s]+?)\s*"

Stringed: @"(?=[\S\s]*?infoEntityIdent\s*=\s*""\s*(ICN[\S\s]+?)\s*""[\S\s]*?>)[\S\s]*?<!ENTITY\s+\1\s+SYSTEM\s+""\s*(ICN[\S\s]+?\.\w[\S\s]+?)\s*"""

Formatted and Explained:

 (?=                           # Look ahead to find the ID ICN
      [\S\s]*? 
      infoEntityIdent \s* = \s* 
      "
      \s* 
      ( ICN [\S\s]+? )              # (1), Entity IDent ICN
      \s* 
      " 
      [\S\s]*? >
 )
                               # Consume now:
 [\S\s]*?                      # Find the ID ICN inside an ENTITY
 <!ENTITY \s+ 
 \1                            # Back reference to Entity IDent ICN
 \s+ SYSTEM \s+ 
 "
 \s* 
 (                             # (2 start), Some other ICN junk
      ICN
      [\S\s]+? 
      \. 
      \w 
      [\S\s]+? 
 )                             # (2 end)
 \s* 
 "

Upvotes: 1

Dean Taylor
Dean Taylor

Reputation: 42051

You are most likely going to want to "escape" your "unknown" result from your first search to be able to use it in your new regular expression.

Something like:

Dim EscapedSearchValue As String = Regex.Escape(match.Groups(1).Value)
Dim Regex2 = New Regex("<!ENTITY " & EscapedSearchValue & "  SYSTEM ""(ICN.+?[.]\w.+?)[""]")

See Regex.Escape(String) Method

Upvotes: 0

Related Questions