Reputation: 46429
Wanting to scan a very long string for regex matches. Wondering what would be the most efficient way to find the first N regex's. e.g. Something like:
'abcabcabc'.scan /b/, limit: 2
would end successfully after 5 characters, if only scan supported a limit option.
(The string is several MB - a memoized data structure in memory - and this is a web request. Perf matters.)
Upvotes: 3
Views: 609
Reputation: 9523
Fortunately, Ruby regex supports lazy matching, so you can use it like this:
'abcabcabc'.match(/(b).*?(b)/)
Adding ?
after .*
makes it mach lazily, stopping as soon as the regex has been fulfilled. From the Regexp class repetition documentation:
Repetition is greedy by default: as many occurrences as possible are matched while still allowing the overall match to succeed. By contrast, lazy matching makes the minimal amount of matches necessary for overall success. A greedy metacharacter can be made lazy by following it with ?.
Upvotes: 1
Reputation: 114188
Not that elegant, but you could use the block form:
str = 'abcabcabc'
result = []
str.scan(/b/) { |match| result << match; break if result.size >= 2 }
result #=> ["b", "b"]
Upvotes: 3