Reputation: 21884
validate that a string matches this format: /^(#\d\s*)+$/
(#1 #2
for instance).
Grab all the numbers with the hash, something like #<MatchData "1234" 1:"#1" 2:"#2">
. It doesnt have to be a MatchData object, any type of array, enumerable would work.
When using match
, it just matches the last occurence:
/^(#\d\s*)+$/.match "#1 #2"
# => #<MatchData "#1 #2" 1:"#2">
When I use scan, it "works":
"#1 #2".scan /#\d/
# => ["#1", "#2"]
But I dont believe I can validate the format of the string, as it would return the same for "aaa #1 #2"
.
Can I, with only 1 method call, both validates that my string matches /^(#\d\s*)+$/
AND grab all the instances of #number
?
I kinda feel bad about asking this since I've been using ruby for a while now. It seems simple but I can't get that to work.
Upvotes: 3
Views: 330
Reputation: 110675
def doit(str)
r = /\A#{"(#\\d)\\s*"*str.count('#')}\z/
str.match(r)&.captures
end
doit "#1#2 #3 " #=> ["#1", "#2", "#3"]
doit " #1#2 #3 " #=> nil
Notice the regular expressions depend only on the number of instances of the character '#'
in the string. As that number is three in both examples the respective regular expressions are equal, namely:
/\A(#\d)\s*(#\d)\s*(#\d)\s*\z/
This regular expression was constructed as follows.
str = "#1#2 #3 "
n = str.count('#')
#=> 3
s = "(#\\d)\\s*"*n
#=> "(#\\d)\\s*(#\\d)\\s*(#\\d)\\s*"
/\A#{s}\z/
#=> /\A(#\d)\s*(#\d)\s*(#\d)\s*\z/
The regular expression reads, "match the beginning of the string followed by three identical capture groups, each optionally followed by spaces, followed by the end of the string. The regular expression therefore both tests the validity of the string and extracts the desired matches in the capture groups.
The safe navigation operator, &
is needed in the event that there is no match (match
returns nil
).
A comment by the OP refers to a generalisation of the question in which the pound character ('#'
) is optional. That can be dealt with by modifying the regular expression as follows.
def doit(str)
r = /\A#{"(?:#?(\\d)(?=#|\\s+|\\z)\\s*)"*str.count('0123456789')}\z/
str.match(r)&.captures
end
doit "1 2 #3 " #=> ["1", "2", "3"]
doit "1 2 #3 " #=> ["1", "2", "3"]
doit "1#2" #=> ["1", "2"]
doit " #1 2 #3 " #=> nil
doit "#1 2# 3 " #=> nil
doit " #1 23 #3 " #=> nil
For strings containing three digits the regular expression is:
/\A(?:#?(\d)(?=#|\s+|\z)\s*)(?:#?(\d)(?=#|\s+|\z)\s*)(?:#?(\d)(?=#|\s+|\z)\s*)\z/
While it is true that this regular expression can potentially be quite long, that does not necessarily mean that it would be relatively inefficient, as the lookaheads are quite localized.
Upvotes: 1
Reputation: 626747
Yes, you may use
s.scan(/(?:\G(?!\A)|\A(?=(?:#\d\s*)*\z))\s*\K#\d/)
See the regex demo
Details
(?:\G(?!\A)|\A(?=(?:#\d\s*)*\z))
- two alternatives:
\G(?!\A)
- the end of the previous successful match |
- or \A(?=(?:#\d\s*)*\z)
- start of string (\A
) that is followed with 0 or more repetitions of #
+ digit + 0+ whitespaces and then followed with the end of string\s*
- 0+ whitespace chars\K
- match reset operator discarding the text matched so far#\d
- a #
char and then a digitIn short: the start of string position is matched first, but only if the string to the right (i.e. the whole string) matches the pattern you want. Since that check is performed with a lookahead, the regex index stays where it was, and then matching occurs all the time ONLY after a valid match thanks to the \G
operator (it matches the start of string or end of previous match, so (?!\A)
is used to subtract the start string position).
rx = /(?:\G(?!\A)|\A(?=(?:#\d\s*)*\z))\s*\K#\d/
p "#1 #2".scan(rx)
# => ["#1", "#2"]
p "#1 NO #2".scan(rx)
# => []
Upvotes: 3