Marc
Marc

Reputation: 53

excel-vba regex pattern

I want the following function to validate a string if it looks like this:

between each may be a space and the order of the 2 optional parts shouldnt matter ("2 box male" == "2 male box")

but instead of doing anything useful this function excepts everything that starts with a digit :

Function validAmount(Zelle As Variant)
Set regEx = CreateObject("VBScript.RegExp")
regEx.IgnoreCase = True
regEx.Pattern = "\d\s?(pcs|pieces|piece|pc|stk|bags|bag|box|bx|boxes)?\s?(male|m|female|f)?"
If (regEx.test(Zelle)) Then
    validAmount = True
Else
    validAmount = False
End If End Function

I hope my error isn't too stupid

edit: I thought of an additional feature. how could I allow multiple entities of the pattern above seperated by a "," like "1 box female, 3 bags m, 4pcs male"

ps: both solutions work pretty well but allow something like this "1 male female" or "2 box bag"

edit2: first of all: I am really thankful for your help! i would have never solved this on my own! and I wanted to click "solved" (cant click "up" cause my reputation is still too low) as soon as everything works as intended. If I should've clicked on it earlier I'm sorry. Its my first question here and I'm a little overwhelmed of how fast your answers come :)

I think I'm unable to express my wishes :D really sorry for that! here is a third try: only (at least) one of each groups should be allowed. genuine inputs should be: "# box gender" "# gender box" "# box" "# gender" "#" but not: "# box box" or "# gender gender"

@sln: I think your first looks more like I want it but it allows two instances of the same group even though it looks for me as it shouldnt :( (same for JMax solution)

@JMax: I love your "split" solution! havent even thought of this simple trick :D I was so fixed on regex that a havent thought of anything else

Upvotes: 5

Views: 2050

Answers (2)

JMax
JMax

Reputation: 26601

Here is a try:

Function validAmount(Zelle As String)
Dim sBoxes As String, sGender As String
Dim arr() As String
Dim i As Integer

arr = Split(Zelle, ",")
sBoxes = "pcs|pieces|piece|pc|stk|bags|bag|box|bx|boxes"
sGender = "male|m|female|f"

validAmount = True
For i = 0 To UBound(arr)
    Set regEx = CreateObject("VBScript.RegExp")
    regEx.IgnoreCase = True
    regEx.Pattern = "\d\s?(((" & sBoxes & ")?\s?(" & sGender & "))|((" & sGender & ")?\s?(" & sBoxes & ")))?$"
    If (regEx.test(arr(i))) Then
        validAmount = validAmount And True
    Else
        validAmount = validAmount And False
    End If
Next i
End Function

And here is a testing procedure:

Sub unitTest()
'could use debug.Assert too
Debug.Print (validAmount("1 box") & " should be True")
Debug.Print (validAmount("1 boxe male") & " should be False")
Debug.Print (validAmount("1 pcs female") & " should be True")
Debug.Print (validAmount("1boxes") & " should be True")
Debug.Print (validAmount("1 female pcs") & " should be True")
Debug.Print (validAmount("boxes") & " should be False")
Debug.Print (validAmount("2 male box") & " should be True")
Debug.Print (validAmount("1 mytest") & " should be False")
Debug.Print (validAmount("1 pc box") & " should be False")
Debug.Print (validAmount("1 box box") & " should be False")
Debug.Print (validAmount("1 box female, 3 bags m, 4pcs male") & " should be True")
End Sub

I've changed boxes and gender into strings so that you can add both in your pattern (I don't know any way to check the order without that trick. Does anybody have a better idea?
I also added a $ to tell Excel this was the end of the string, else any string begining with a digit would pass.

Upvotes: 3

user557597
user557597

Reputation:

This will work if your vba can at least do lookahead assertions. I thought I read it does regex like javascript. If so, this works. But disregard if not.

Its not that easy to explain, so there is a breakdown of its functionality.

^\d(?:\s?(?:(?!\1)(?:pcs|pc|pieces|piece|stk|bags|bag|boxes|box|bx)()|(?!\2)(?:male|m|female|f)()|)){2}$

Expanded:

^             # Begining of string

   \d                  # A digit
   (?:                 # Grouping
       \s?                          # Optional whitespace
       (?:                                # Grouping
           (?!\1)                                            # Never been here before, capt grp 1 is undefined
           (?:pcs|pc|pieces|piece|stk|bags|bag|boxes|box|bx) # One of these alternations is found
           ()                                                # Mark we've been here, Capt Grp 1 is defined
         |                                          # OR ..
           (?!\2)                                            # Never been here before, capt grp 2 is undefined
           (?:male|m|female|f)                               # One of these alternations is found
           ()                                                # Mark we've been here, Capt Grp 2 is defined
         |                                          # OR ..
                                                             # Nothing, this allows to pick optional \s only, or combined
       )                                 # End grouping

   ){2}                # End grouping, do exactly 2 times

$             # End of string

Edit

To address your edits in one shot, the below regexes will work. If you wanted to make either group of alternations acceptable, combine them into one alternation. There could be problems doing it this way if naming conflicts occur. This is because you are doing it twice. Also, is the spacing a big issue any more?

There are 2 regex below.

This one is very strict on spacing. Its more in line with the way you state the problem. Its not really a good idea.

^(?=\d)(?:\d(?:\s?(?:pcs|pc|pieces|piece|stk|bags|bag|boxes|box|bx|male|m|female|f|)){2}(?:,\s?(?=\d)|$))+$

This one is totaly relaxed on spacing while maintaning the corect non-whitespace form.
This is the recommended version. Feel free to change the spacing constructs in the regex.

^(?=\s*\d)(?:\s*\d(?:\s*(?:pcs|pc|pieces|piece|stk|bags|bag|boxes|box|bx|male|m|female|f|)){2}\s*(?:,(?=\s*\d)|$))+$

Here it is expanded, see the comments from the original expanded regex above for the difference.

^
   (?=\s*\d)
   (?:
      \s*
      \d
      (?:
          \s*
          (?:pcs|pc|pieces|piece|stk|bags|bag|boxes|box|bx|male|m|female|f|)
      ){2}
      \s*
      (?:
          , (?=\s*\d)
        | $
      )
   )+
$

Upvotes: 2

Related Questions