Reputation: 2233
I have a list of OS names that I am trying to normalize, and I'm using regex to try to match the string. Here are the variations of the string:
microsoft windows 2016
win2016
win2012
windows 2012
windows server 2012
windows server 2008 r2
windows server 2008
windows 2019
windows 2012 r2 std
win2003
windows server 2003
windows 2003
Here is my attempt (I'm very bad at regex):
(windows.*server|((win|windows)*.(2003|2008|2012|2016|2019)))
According to the link above, it is matching n2016
and not win2016
, which could be problematic as the list will consist of other operating system names. Are there any improvements that could increase the accuracy of the match?
Upvotes: 2
Views: 441
Reputation: 163362
Looking at the pattern you tried, to match all the windows variations without matching n2016
instead of win2016
you could use make use of word boundaries, an alternation and a character class to shorten the list of numbers.
You don't need any capture groups.
\bwin(?:dows)?(?:\s+server)?(?:\s*(?:200[38]|201[269]))?\b
Explanation
\bwin
A word boundary to prevent the word being part of a longer word, and match win(?:dows)?
Optionally match dows(?:\s+server)?
Optionally match 1+ whitespace chars and server(?:
Non capture group
\s*
Match optional whitespace chars(?:200[38]|201[269])
Match either 2003 2008 2012 2016 2019)?
Close the group and make it optional\b
A word boundaryUpvotes: 1
Reputation: 18796
If you have an exact collection of strings, just create a dictionary to map them as you please or a list to test membership!
# KeyError for missing keys; call .get() to get None instead
normalized_version = version_dict[test_input.strip().lower()]
versions = ['microsoft windows 2016', 'win2016', 'win2012', 'windows 2012', 'windows server 2012', 'windows server 2008 r2', 'windows server 2008', 'windows 2019', 'windows 2012 r2 std', 'win2003', 'windows server 2003', 'windows 2003']
if test_input.strip().lower() in versions:
...
Upvotes: 0
Reputation: 103844
You could use:
/^((?:microsoft )?(?:win|windows)(?:[ ]| server )?(?:2003|2008|2012|2019|2016)(?: r2)?(?: std)?)$/
Upvotes: 1
Reputation: 626861
You can use
microsoft\s+windows\s+2016|win(?:20(?:03|1[26])|dows\s+(?:20(?:03|1(?:2(?:\s+r2\s+std)?|9))|server\s+20(?:0(?:3|8(?:\s+r2)?)|12)))
See the regex demo
Details
microsoft\s+windows\s+2016
- microsoft windows 2016
|
- orwin
- win
(?:
- start of a non-capturing group:
20(?:03|1[26])
- 20
, then 03
or 1
followed with 2
or 6
|
- ordows
- dows
\s+
- one or more whitespaces(?:
- start of a non-capturing group:
20(?:03|1(?:2(?:\s+r2\s+std)?|9))
- 20
followed with either 03
or 1
that is followed with either 2
that is followed with 1+ whitespaces and r2
, whitespaces, std
(optionally) or 9
|
- orserver\s+20
- server
, 1+ whitespaces, 20
(?:
- start of another group
0(?:3|8(?:\s+r2)?)
- 0
, then 3
or 8
followed with an optional substring of 1+ whitespaces and r2
|
- or12
- 12
)
- end of the group)
- end of the group)
- end of the groupUpvotes: 1