Reputation: 49
I have written a regular expression in ruby that is working fine in single line but it is quite large so I need to write it in multi line form.
I am using %r{}x
format to use it in multi line but it is not working.
regex = (/\A(RM|R1)([A-Z])([A-Z])(\d+)(\d\d+)([A-Z])([A-Z])([A-Z]+)-?(\d+)([A-Z])(\d)#?([A-Z])([A-Z])(\d)\z/)
in single line
regex = %r{
([A-Z])
([A-Z])
([A-Z])
(\d+)
(\d\d+)
([A-Z])
([A-Z])
([A-Z]+)
-?
(\d+)
([A-Z])
(\d)
#?
([A-Z])
([A-Z])
(\d)
}x
in multiple lines (one group in each line)
What is going wrong with my approach?
Upvotes: 1
Views: 301
Reputation: 110675
Here is your regular expression defined in free-spacing mode, which is what I think you are looking for.
regex = /
\A # beginning of string
(RM|R1) # match 'RM' or 'R1' CG 1
([A-Z]) # match 1 uppercase letter CG 2
([A-Z]) # match 1 uppercase letter CG 3
(\d+) # match > 0 digits CG 4
(\d{2,}) # match > 0 digits CG 5
([A-Z]) # match 1 uppercase letter CG 6
([A-Z]) # match 1 uppercase letter CG 7
([A-Z]+) # match > 0 uppercase letters CG 8
-? # optionally match '-'
(\d+) # match > 0 digits CG 9
([A-Z]) # match 1 uppercase letter CG 10
(\d) # match > 0 digits CG 11
\#? # optionally match '#'
([A-Z]) # match 1 uppercase letter CG 12
([A-Z]) # match 1 uppercase letter CG 13
(\d) # match > 0 digits CG 14
\z # end of string
/x # free-spacing regex definition mode
"CG" is for "capture group". One of the main uses of free-spacing mode is to document the regex, as I've done here.
I've made two changes to your regex. Firstly, I've replaced (\d\d+)
with (\d{2,})
, which has the same effect but arguably reads better. Secondly, the character "#"
begins a comment in free-spacing mode, so it must be escaped (\#
) if it is to be matched.
As an example of the use of this regex,
test_str = "RMAB12345CDEF-6G7#HI8"
m = test_str.match regex
#=> #<MatchData "RMAB12345CDEF-6G7#HI8" 1:"RM" 2:"A" 3:"B" 4:"123" 5:"45"
# 6:"C" 7:"D" 8:"EF" 9:"6" 10:"G" 11:"7" 12:"H" 13:"I" 14:"8">
m.captures
#=> ["RM", "A", "B", "123", "45", "C", "D", "EF", "6", "G", "7", "H", "I", "8"]
Notice that it's not clear how the 5 digits are to be divided between capture groups 4 and 5.
There is one thing you must be careful about when using free-spacing mode. All spaces are removed before the expression is parsed, including any spaces you want matched. For example,
"ab c".match? /ab c/ #=> true
"ab c".match? /ab c/x #=> false
"abc".match? /ab c/x #=> true
Here are some ways to protect the space character (all return true
):
"ab c".match? /ab\ c/x # escape a space character
"ab c".match? /ab[ ]c/x # put in a character class
"ab c".match? /ab[[:space:]]c/x # Unicode bracket expression
"ab c".match? /ab\p{Space}c/x # Unicode \p{} construct
"ab c".match? /ab\sc/x # match a whitespace character
Note that \s
matches tabs, newlines and two other characters as well as spaces, which may or may not be desired.
Upvotes: 2
Reputation: 626738
You should escape the #
symbol as in the free-spacing mode, it denotes a comment start:
Literal white space inside the pattern is ignored, and the octothorpe (
#
) character introduces a comment until the end of the line. This allows the components of the pattern to be organized in a potentially more readable fashion.
So, replace #?
with \#?
.
Upvotes: 2