Reputation: 11182
Is there any way to parse a Complex RegEx pattern(containing several named groups
as well as several numbered groups
and non-capturing groups
) and report about each groupname
or groupnumber
along with pattern text.
Suppose, I do have a RegEx pattern like this:
(?im)(?<x>\b[a-s03]+\b)(?-i)(?<a>\p{L}+?,(?<b>.+?:(?<c>.+?;(?<d>.+?(?:\d|sample-text|(\k'x'|sos30))))))
And I like to extract:=
Named groups:
x==>(?<x>\b[a-s03]+\b)
a==>(?<a>\p{L}+?,(?<b>.+?:(?<c>.+?;(?<d>.+?(?:\d|sample-text|(\k'x'|sos30))))))
b==>(?<b>.+?:(?<c>.+?;(?<d>.+?(?:\d|sample-text|(\k'x'|sos30)))))
c==>(?<c>.+?;(?<d>.+?(?:\d|sample-text|(\k'x'|sos30))))
d==>(?<d>.+?(?:\d|sample-text|(\k'x'|sos30)))
Numbered groups:
1==>(\k'x'|sos30)
Non-capturing-groups:
1st==>(?:\d|sample-text|(\k'x'|sos30))
Purpose of this Requirement:
I do have a large database of complex RegEx patterns. The previous programmar worked on this did not use any comment [(?#...)
] while preparing these complex patterns, moreover no linebreaks
exists within those patterns. I have to modify those patterns some cases and also have to use comment within those patterns. Now it is something like searching a needle in the haystakes. I simply could not use RegEx for this purpose. So, I inclined to use a parser for this case.
What I tried:
I tried GetGroupNames
and GetGroupNumbers
collection for that purpose. I could extract only the Names/Numbers
of the groups
, but not the corresponding textual patterns.
I am looking for a Non-RegEx solution/some hints.
Upvotes: 3
Views: 320
Reputation: 114741
There is a RegexParser class (internal) in the System.Text.RegularExpressions namespace which you can call using Private Reflection. I have a sample implementation I've using in my FxCopContrib project so far.
There's the RegexParser implementation from the Mono project which you might be able to leverage.
Then there's Deveel's Regex library.
Upvotes: 0
Reputation: 6258
How about this, for this:
(?im)(?<x>\b[a-s03]+\b)(?-i)(?<a>\p{L}+?,(?<b>.+?:(?'c'.+?;(.+?(?:\d|sample-text|(\k'x'|sos30))))))
This, as the Output:
(0)<0>: (?im)(?<x>\b[a-s03]+\b)(?-i)(?<a>\p{L}+?,(?<b>.+?:(?'c'.+?;(.+?(?:\d|sample-text|(\k'x'|sos30))))))
(1)<x>: \b[a-s03]+\b
(2)<a>: \p{L}+?,(?<b>.+?:(?'c'.+?;(.+?(?:\d|sample-text|(\k'x'|sos30))))
(3)<b>: .+?:(?'c'.+?;(.+?(?:\d|sample-text|(\k'x'|sos30)))
(4)<c>: .+?;(.+?(?:\d|sample-text|(\k'x'|sos30))
(5)<5>: .+?(?:\d|sample-text|(\k'x'|sos30)
(6)<6>: \k'x'|sos30
This is the code:
Imports System.Collections.Specialized
Module Module1
Public DictGroups As New OrderedDictionary
Public DictTrackers As New Dictionary(Of Integer, Boolean)
Public intGroups As Integer = 0
Public CommandGroup As Boolean = False
Sub Main()
Dim regexToEval As String = "(?im)(?<x>\b[a-s03]+\b)(?-i)(?<a>\p{L}+?,(?<b>.+?:(?'c'.+?;(.+?(?:\d|sample-text|(\k'x'|sos30))))))"
Dim curChar As String = ""
DictGroups.Add(0, "(0)<0>: " & vbTab)
DictTrackers.Add(0, True)
For i = 1 To regexToEval.Length
Dim iChar As String = regexToEval.Substring(i - 1, 1)
If curChar <> "\" AndAlso iChar = ")" Then EndGroup()
AddStrToTrackers(iChar)
If curChar = "\" OrElse iChar <> "(" OrElse regexToEval.Length < i + 2 Then curChar = iChar : Continue For
If regexToEval.Substring(i, 1) = "?" Then
i += 1 : AddStrToTrackers("?")
If regexToEval.Substring(i, 1) = ":" Then i += 1 : AddStrToTrackers(":") : curChar = ":" : Continue For
Dim NameLength As Integer = 0
If regexToEval.Substring(i, 1) = "<" Or regexToEval.Substring(i, 1) = "'" Then
i += 1 : AddStrToTrackers(regexToEval.Substring(i - 1, 1))
i += 1
For x = i To regexToEval.Length
If regexToEval.Substring(x - 1, 1) = ">" Or regexToEval.Substring(x - 1, 1) = "'" Then
NameLength = x - i
Exit For
End If
Next
Else
CommandGroup = True
Continue For
End If
If NameLength > 0 Then
Dim GroupName As String = regexToEval.Substring(i - 1, NameLength)
i += NameLength : curChar = regexToEval.Substring(i - 1, 1) : AddStrToTrackers(GroupName & curChar)
intGroups += 1
DictGroups.Add(intGroups, "(" & DictGroups.Count & ")<" & GroupName & ">: " & vbTab)
DictTrackers.Add(intGroups, True)
Continue For
End If
End If
curChar = iChar
intGroups += 1
DictGroups.Add(intGroups, "(" & DictGroups.Count & ")<" & intGroups.ToString & ">: " & vbTab)
DictTrackers.Add(intGroups, True)
Next
Dim Output As String = MakeOutput()
End Sub
Private Function MakeOutput() As String
Dim retString As String = String.Empty
For i = 0 To DictGroups.Count - 1
retString &= DictGroups(i) & vbCrLf
Next
Return retString
End Function
Public Sub EndGroup()
If CommandGroup Then
CommandGroup = False
Exit Sub
End If
Dim HighestNum As Integer = 0
For Each item In DictTrackers
If Not item.Value Then Continue For
If item.Key > HighestNum Then HighestNum = item.Key
Next
If HighestNum <> 0 Then DictTrackers(HighestNum) = False
End Sub
Public Sub AddStrToTrackers(ByVal addString As String)
For Each item In DictTrackers
If item.Value Then DictGroups(item.Key) &= addString
Next
End Sub
End Module
The only difference is that I'm not capturing either Non-Capture groups, nor function groups. Of course, this is just quick code I made in like 10 minutes. But it's a start if you want it. I use the OrderedDictionary as Keys for Group-Numbers. You could change that structure if you wanted to also include non-capture groups and function groups in the output.
Upvotes: 3