AnApprentice
AnApprentice

Reputation: 111040

Regex - Matching in Rubular bu not in Ruby

Given text like:

body = 

yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada 
< via mobile device > 

Yada Yada <[email protected]> wrote:

yada yada yada yada yada yada yada yada yada 

I want to match the 2nd paragraph, so I'm doing:

body = body.split(/.* <[email protected]> wrote: .*/m).first

But that's not matching in ruby even though it is in Rubular. Any ideas why? thanks

Upvotes: 0

Views: 741

Answers (2)

Alan Moore
Alan Moore

Reputation: 75252

Try this instead:

body = body.split(/.*<[email protected]> wrote:.*/).first

The space after the first .* was useless, and (as @aef pointed out) the space before the second .* was erroneous (maybe there was a space there in your rubular test).

Notice that I removed the m modifier, too. If I hadn't, the regex would have matched the whole string, resulting in a empty array. That's what Ruby calls multiline mode (and everyone else calls single-line or dot-all mode): the . matches anything including newlines.

EDIT: See it on ideone.com

Upvotes: 1

aef
aef

Reputation: 4698

The line

Yada Yada <[email protected]> wrote:

does end with a linebreak, not with a space. So your regular expression should be:

/.* <[email protected]> wrote:\n.*/m

Attention: Windows systems and some protocols like HTML can use different linebreak encodings. If you want to be sure to be compatible, convert your input to unix linebreak encoding first and then do the data extraction. You could use my linebreak gem for this.

Upvotes: 1

Related Questions