Reputation: 35
I've been trying to parse a command by using a regular expression for a good day now.
I came close to a solution a few times, but there's always a small bit that messes things up.
I'm trying to keep the expression generic, because i'd like to use it on different commands, although the amount of parameters is the same.
I basically have a maximum of 4 capture groups:
Number1 & Number2 are seperated by a '-' (= optional, without a '-' no Number2)
Number1 (+ Number2) and Param1 are seperated by a space (= mandatory)
Param1 & Param2 are seperated by a space (= optional)
So the basic layout of the command would be:
[Number1]-[Number2] [Param1] [Param2]
Here's a list of example input that can be expected:
123456A789C test
123456.789C-987654Z321Y test
123456.789C test1 test2
123456.789C-987654Z321Y test1 test2
I managed to cook up a regex for the above examples, since they're fairly simple. However, it then occurred to me that Param1 and Param2 could be sentences. So we decided that if Param1 or Param2 contain spaces, they should be encapsulated between quotes (") We still however, want to allow entering single word'ish data without quotes, so the quotes become optional.
1-2 "test1" "test2" could also be entered as 1-2 test1 test2
1-2 "test1 test2" "test2 test3" can't be entered as 1-2 test1 test2 test3 test4
An example input:
123456.789C-987654Z321Y "test1 test2" "test3 test4"
And this is where I can't get my regex to work properly. As soon as I start making certain parts optional, it doesn't behave like I want it to behave.
The following regex is what I came up with that matches most of the situations:
(?i)^(?<numbers>(?<number1>[^\s]*?)(?:[-](?<number2>[^\s]*?))?)\s(?<params>("?)(?<param1>[^"]*)\1\s("?)(?<param2>[^"]*)\2)$
It however doesn't accept 1 test, 1-2 test, 1 "test", 1-2 "test", 1 "test test", 1-2 "test test"
Can some regex pro help me out here and explain where I went wrong with my expression?
Here's another regex that I used as my starting point to match the most complete command, eg 1-2 "test1 test2" "test3 test4"
(?i)^(?<numbers>(?<number1>.*)-(?<number2>.*))\s(?<params>"(?<param1>[^"]*)"\s"(?<param2>[^"]*)")$
Upvotes: 0
Views: 68
Reputation: 5774
For the sake of clarity I removed all the named groups from the example.
I came up with this
<!-- language: none -->
^([a-zA-Z0-9.]+)(-([a-zA-Z0-9.]+))?\s(([a-zA-Z][a-zA-Z0-9]*)|"[a-zA-Z 0-9]+")(\s(([a-zA-Z][a-zA-Z0-9]*)|"[a-zA-Z 0-9]+"))?
(See source)
Here is a breakdown
^
the beginning of a line (make sure you are using a function that can prodive the options you want : case insensitive, global match, multi-line. I've used the option on the regex engine to simplify)([a-zA-Z0-9.]+)
for the mandatory first one and (-([a-zA-Z0-9.]+))?
for the optional second number.\s
. Please note that this can't be used within []
like you do since they revert the shorthand to it's literal meaning anti-slash or s.(([a-zA-Z][a-zA-Z0-9]*)|"[a-zA-Z 0-9]+")
take care of this. The second part of the alternation adds "
and
.(\s(([a-zA-Z][a-zA-Z0-9]*)|"[a-zA-Z 0-9]+"))?
$
Upvotes: 1
Reputation: 35400
(?<N1>[^- ]+)(-(?<N2>[^ ]+))? (?<P1>("[^"]+")|([^ ]+))( (?<P2>("[^"]+")|([^ ]+)))?
Upvotes: 1