Reputation: 19120
I'm using Ruby 2.4. I want to match an optional "a" or "b" character, followed by an arbitrary amount of white space, and then one or more numbers, but my regex's are failing to match any of these:
2.4.0 :017 > MY_TOKENS = ["a", "b"]
=> ["a", "b"]
2.4.0 :018 > str = "40"
=> "40"
2.4.0 :019 > str =~ Regexp.new("^[#{Regexp.union(MY_TOKENS)}]?[[:space:]]*\d+[^a-z^0-9]*$")
=> nil
2.4.0 :020 > str =~ Regexp.new("^#{Regexp.union(MY_TOKENS)}?[[:space:]]*\d+[^a-z^0-9]*$")
=> nil
2.4.0 :021 > str =~ Regexp.new("^#{Regexp.union(MY_TOKENS)}?[[:space:]]*\d+$")
=> nil
I'm stumped as to what I'm doing wrong.
Upvotes: 1
Views: 1075
Reputation: 54223
If they are single characters, just use MY_TOKENS.join
inside the character class:
MY_TOKENS = ["a", "b"]
str = "40"
first_regex = /^[#{MY_TOKENS.join}]?[[:space:]]*\d+[^a-z0-9]*$/
# /^[ab]?[[:space:]]*\d+[^a-z0-9]*$/
puts str =~ first_regex
# 0
You can also integrate the Regexp.union, it might lead to some unexpected bugs though, because the flags of the outer regexp won't apply to the inner one :
second_regex = /^#{Regexp.union(MY_TOKENS)}?[[:space:]]*\d+[^a-z0-9]*$/
# /^(?-mix:a|b)?[[:space:]]*\d+[^a-z0-9]*$/
puts str =~ second_regex
# 0
The above regex looks a lot like what you did, but using //
instead of Regexp.new
prevents you from having to escape the backslashes.
You could use Regexp#source
to avoid this behaviour :
third_regex = /^(?:#{Regexp.union(MY_TOKENS).source})?[[:space:]]*\d+[^a-z0-9]*$/
# /^(?:a|b)?[[:space:]]*\d+[^a-z0-9]*$/
puts str =~ third_regex
# 0
or simply build your regex union :
fourth_regex = /^(?:#{MY_TOKENS.join('|')})?[[:space:]]*\d+[^a-z0-9]*$/
# /^(?:a|b)?[[:space:]]*\d+[^a-z0-9]*$/
puts str =~ fourth_regex
# 0
The 3 last examples should work fine if MY_TOKENS
has words instead of just characters.
first_regex
, third_regex
and fourth_regex
should all work fine with /i
flag.
As an example :
first_regex = /^[#{MY_TOKENS.join}]?[[:space:]]*\d+[^a-z0-9]*$/i
"A 40" =~ first_regex
# 0
Upvotes: 3
Reputation: 626758
I believe you want to match a string that may contain any of the alternatives you defined in the MY_TOKENS
, then 0+ whitespaces and then 1 or more digits up to the end of the string.
Then you need to use
Regexp.new("\\A#{Regexp.union(MY_TOKENS)}?[[:space:]]*\\d+\\z").match?(s)
or
/\A#{Regexp.union(MY_TOKENS)}?[[:space:]]*\d+\z/.match?(s)
When you use a Regexp.new
, you should rememeber to double escape backslashes to define a literal backslash (e.g. "\d" is a digit matching pattern). In a regex literal notation, you may use a single backslash (/\d/
).
Do not forget to match the start of a string with \A
and end of string with \z
anchors.
Note that [...]
creates a character class that matches any char that is defined inside it: [ab]
matches an a
or b
, [program]
will match one char, either p
, r
, o
, g
, r
, a
or m
. If you have multicharacter sequences in the MY_TOKENS
, you need to remove [...]
from the pattern.
To make the regex case insensitive, pass a case insensitive modifier to the pattern and make sure you use .source
property of the Regex.union
created regex to remove flags (thanks, Eric):
Regexp.new("(?i)\\A#{Regexp.union(MY_TOKENS).source}?[[:space:]]*\\d+\\z")
or
/\A#{Regexp.union(MY_TOKENS).source}?[[:space:]]*\d+\z/i
The regex created is /(?i-mx:\Aa|b?[[:space:]]*\d+\z)/
where (?i-mx)
means the case insensitive mode is on and multiline (dot matches line breaks and verbose modes are off).
Upvotes: 1