NeoFax
NeoFax

Reputation: 25

RegEx VBA Excel complex string

I have a function pulled from here. My problem is that I don't know what RegEx pattern I need to use to split out the following data:

+1 vorpal unholy longsword +31/+26/+21/+16 (2d6+13)
+1 vorpal flaming whip +30/+25/+20 (1d4+7 plus 1d6 fire and entangle)
2 slams +31 (1d10+12)

I want it to look like:

+1 vorpal unholy longsword, 31 
+1 vorpal flaming whip, 30 
2 slams, 31

Here is the VBA code that does the RegExp validation:

Public Function RXGET(ByRef find_pattern As Variant, _
                        ByRef within_text As Variant, _
                        Optional ByVal submatch As Long = 0, _
                        Optional ByVal start_num As Long = 0, _
                        Optional ByVal case_sensitive As Boolean = True) As Variant
' RXGET - Looks for a match for regular expression pattern find_pattern
' in the string within_text and returns it if found, error otherwise.
' Optional long submatch may be used to return the corresponding submatch
' if specified - otherwise the entire match is returned.
' Optional long start_num specifies the number of the character to start
' searching for in within_text. Default=0.
' Optional boolean case_sensitive makes the regex pattern case sensitive
' if true, insensitive otherwise. Default=true.

Dim objRegex As VBScript_RegExp_55.RegExp
Dim colMatch As VBScript_RegExp_55.MatchCollection
Dim vbsMatch As VBScript_RegExp_55.Match
Dim colSubMatch As VBScript_RegExp_55.SubMatches
Dim sMatchString As String

Set objRegex = New VBScript_RegExp_55.RegExp

' Initialise Regex object
With objRegex
    .Global = False
    ' Default is case sensitive
    If case_sensitive Then
        .IgnoreCase = False
    Else: .IgnoreCase = True
    End If
    .pattern = find_pattern
End With

' Return out of bounds error
If start_num >= Len(within_text) Then
    RXGET = CVErr(xlErrNum)
    Exit Function
End If
sMatchString = Right$(within_text, Len(within_text) - start_num)

' Create Match collection
Set colMatch = objRegex.Execute(sMatchString)
If colMatch.Count = 0 Then ' No match
    RXGET = CVErr(xlErrNA)
Else
    Set vbsMatch = colMatch(0)
    If submatch = 0 Then ' Return match value
        RXGET = vbsMatch.Value
    Else
        Set colSubMatch = vbsMatch.SubMatches ' Use the submatch collection
        If colSubMatch.Count < submatch Then
            RXGET = CVErr(xlErrNum)
        Else
            RXGET = CStr(colSubMatch(submatch - 1))
        End If
    End If
End If
End Function

Upvotes: 0

Views: 3499

Answers (1)

Martin Ender
Martin Ender

Reputation: 44289

I don't know about Excel but this should get you started on the RegEx:

/(?:^|, |and |or )(\+?\d?\s?[^\+]*?) (?:\+|-)(\d+)/

NOTE: There is a slight caveat here. This will also match if an element begins with + only (not being followed by a digit).

Capture groups 1 and 2 contain the strings that go left and right of your comma (if the whole pattern has index 0). So you can something like capture[1] + ', ' + capture[2] (whatever your syntax for that is).

Here is an explanation of the regex:

/(?:^|, |and |or )         # make sure that we only start looking after
                           # the beginning of the string, after a comma, after an
                           # and or after an or; the "?:" makes sure that this
                           # subpattern is not capturing
 (\+?                      # a literal "+"
 \d+                       # at least one digit
                           # a literal space
 [^+]*?)                   # arbitrarily many non-plus characters; the ? makes it
                           # non-greedy, otherwise it might span multiple lines
                           # a literal space
 \+                        # a literal "+"
 (\d+)/                    # at least one digit (and the brakets are for capturing)

Upvotes: 3

Related Questions