Reputation: 25862
I am getting completely different reults from string.scan and several regex testers...
I am just trying to grab the domain from the string, it is the last word.
The regex in question:
/([a-zA-Z0-9\-]*\.)*\w{1,4}$/
The string (1 single line, verified in Ruby's runtime btw)
str = 'Show more results from software.informer.com'
Work fine, but in ruby....
irb(main):050:0> str.scan /([a-zA-Z0-9\-]*\.)*\w{1,4}$/
=> [["informer."]]
I would think that I would get a match on software.informer.com ,which is my goal.
Upvotes: 3
Views: 149
Reputation: 370112
It does not look as if you expect more than one result (especially as the regex is anchored). In that case there is no reason to use scan.
'Show more results from software.informer.com'[ /([a-zA-Z0-9\-]*\.)*\w{1,4}$/ ]
#=> "software.informer.com"
If you do need to use scan (in which case you obviously need to remove the anchor), you can use (?:)
to create non-capturing groups.
'foo.bar.baz lala software.informer.com'.scan( /(?:[a-zA-Z0-9\-]*\.)*\w{1,4}/ )
#=> ["foo.bar.baz", "lala", "software.informer.com"]
Upvotes: 2
Reputation: 66436
How about doing this :
/([a-zA-Z0-9\-]*\.*\w{1,4})$/
This returns
informer.com
On your test string.
http://rubular.com/regexes/13670
Upvotes: 0
Reputation: 42411
You are getting a match on software.informer.com
. Check the value of $&
. The return of scan
is an array of the captured groups. Add capturing parentheses around the suffix, and you'll get the .com
as part of the return value from scan
as well.
The regex testers and Ruby are not disagreeing about the fundamental issue (the regex itself). Rather, their interfaces are differing in what they are emphasizing. When you run scan
in irb
, the first thing you'll see is the return value from scan
(an Array of the captured subpatterns), which is not the same thing as the matched text. Regex testers are most likely oriented toward displaying the matched text.
Upvotes: 2
Reputation: 29427
Your regex is correct, the result has to do with the way String#scan behaves. From the official documentation:
"If the pattern contains groups, each individual result is itself an array containing one entry per group."
Basically, if you put parentheses around the whole regex, the first element of each array in your results will be what you expect.
Upvotes: 3