Zombies
Zombies

Reputation: 25862

Very odd issue with Ruby and regex

I am getting completely different reults from string.scan and several regex testers...

I am just trying to grab the domain from the string, it is the last word.

The regex in question:

/([a-zA-Z0-9\-]*\.)*\w{1,4}$/

The string (1 single line, verified in Ruby's runtime btw)

str = 'Show more results from software.informer.com'

Work fine, but in ruby....

irb(main):050:0> str.scan /([a-zA-Z0-9\-]*\.)*\w{1,4}$/
=> [["informer."]]

I would think that I would get a match on software.informer.com ,which is my goal.

Upvotes: 3

Views: 149

Answers (4)

sepp2k
sepp2k

Reputation: 370112

It does not look as if you expect more than one result (especially as the regex is anchored). In that case there is no reason to use scan.

'Show more results from software.informer.com'[ /([a-zA-Z0-9\-]*\.)*\w{1,4}$/ ]
#=> "software.informer.com"

If you do need to use scan (in which case you obviously need to remove the anchor), you can use (?:) to create non-capturing groups.

'foo.bar.baz lala software.informer.com'.scan( /(?:[a-zA-Z0-9\-]*\.)*\w{1,4}/ )
#=> ["foo.bar.baz", "lala", "software.informer.com"]

Upvotes: 2

marcgg
marcgg

Reputation: 66436

How about doing this :

/([a-zA-Z0-9\-]*\.*\w{1,4})$/

This returns

informer.com

On your test string.

http://rubular.com/regexes/13670

Upvotes: 0

FMc
FMc

Reputation: 42411

You are getting a match on software.informer.com. Check the value of $&. The return of scan is an array of the captured groups. Add capturing parentheses around the suffix, and you'll get the .com as part of the return value from scan as well.

The regex testers and Ruby are not disagreeing about the fundamental issue (the regex itself). Rather, their interfaces are differing in what they are emphasizing. When you run scan in irb, the first thing you'll see is the return value from scan (an Array of the captured subpatterns), which is not the same thing as the matched text. Regex testers are most likely oriented toward displaying the matched text.

Upvotes: 2

Alex Reisner
Alex Reisner

Reputation: 29427

Your regex is correct, the result has to do with the way String#scan behaves. From the official documentation:

"If the pattern contains groups, each individual result is itself an array containing one entry per group."

Basically, if you put parentheses around the whole regex, the first element of each array in your results will be what you expect.

Upvotes: 3

Related Questions