mbigras
mbigras

Reputation: 8055

Ruby regular expression to "go until blank line"

I have the following test string:

puts "Wrong guess receives feedback"
p (game.guess(11) == "Too high!")

puts "Wrong guess deducts from remaining guesses"
p (game.remaining_guesses == 3)

In english I'm trying to:

capture everything after "puts " 
until you get to a blank line
(aka beginning of a string followed immediately by end of string)

I know how to "go until you run into something". Like for example, "go until you run into a double quote"

[6] pry(main)> re = /[^"]+/
=> /[^"]+/
[7] pry(main)> "stuff before a quote\"".slice(re)
=> "stuff before a quote"

I think re = /^$/ will "capture a blank line" http://rubular.com/r/E5F7wH6sNq

So How would I go about capturing "everything after 'puts ' and until you get to a blank line"? I've tried the following:

re = /puts ([^^$]+)/

But it didn't work: http://rubular.com/r/yUhA090fLm

Upvotes: 0

Views: 732

Answers (2)

Henrik N
Henrik N

Reputation: 16274

This would be one way to do it:

re = /puts (.*?)\r?\n\r?\n/m

.* means "zero or more of any character except newlines". I'll explain the ? later.

The m (for "multiline") modifier at the end makes it mean "zero or more of any character including newlines".

The \r?\n\r?\n is two newlines in a row, accounting for the fact that it can be represented differently on different systems. You could replace this part with $^ instead, but then the (.*?) will end up contain a trailing newline, because the $ for "end of line" matches after a newline, not before.

The ? makes .* "lazy" – it matches as little as possible, instead of being "greedy" and matching as much as possible. This is relevant if you have multiple blank lines: you presumably don't want it to greedily match "zero or more of any character until the very last blank line"; you want it to lazily match "zero or more of any character until the first blank line".

One of the problems with your /puts ([^^$]+)/ is that [^abc] is effectively a shortcut for "any one character that isn't a, b or c". It doesn't mean "not an 'a' followed by a 'b' followed by a 'c'".

Upvotes: 1

Aleksei Matiushkin
Aleksei Matiushkin

Reputation: 121000

str.scan /(?<=puts ).*?(?=\R\R|\z)/m                                                                                                           
#⇒ [
#    [0] "\"Wrong guess receives feedback\"\np (game.guess(11) == \"Too high!\")",
#    [1] "\"Wrong guess deducts from remaining guesses\"\np (game.remaining_guesses == 3)"
# ]

Positive lookbehind puts, followed by non-greedy anything till two carriage returns / blank lines (note that \R captures the latter on any platform, including Win and MacOS,) or the end of input.

Upvotes: 1

Related Questions