Reputation: 666
I can't figure out what I'm doing different in the below example. I have two string which in my perspective are similar - plain strings. For each string I have a regex, but the first regex, /\*Hi (.*) \*,/
, gives me a result where the regex match is presented in 2 arrays: [["result"]]
. I need my result to be presented in just 1 array: ["result"]
. What am I doing differently in the 2 below examples?
✗ irb
2.0.0p247 :001 > name_line_1 = "*Hi Peter Parker *,"
=> "*Hi Peter Parker *,"
2.0.0p247 :002 > name_line_1.scan(/\*Hi (.*) \*,/)
=> [["Peter Parker"]]
2.0.0p247 :003 > name_line_2 = "Peter Parker<br />Memory Lane 60<br />0000 Gotham<br />USA<br />TEL:: 00000000000<br /><a href=\"mailto:[email protected]\">[email protected]</a><br />\r"
=> "Peter Parker<br />Memory Lane 60<br />0000 Gotham<br />USA<br />TEL:: 00000000000<br /><a href=\"mailto:[email protected]\">[email protected]</a><br />\r"
2.0.0p247 :004 > name_line_2.scan(/^[^<]*/)
=> ["Peter Parker"]
Upvotes: 2
Views: 191
Reputation: 30455
scan
returns an array of matches. As the other answers point out, if your regex has capturing groups (parentheses), that means each match will return an array, with one string for each capturing group within the match.
If it didn't do this, scan
wouldn't be very useful, as it is very common to use capturing groups in a regex to pick out different parts of the match.
I suspect that scan
is not really the best method for your situation. scan
is useful when you want to get all the matches from a string. But in the string you show, there is only one match anyways. If you want to get a specific capturing group from the first match in a string, the easiest way is:
string[/regex/, 1] # extract the first capturing group, or nil if there is no match
Another way is to do something like this:
if string =~ /regex/
# $1 will contain the first capturing group from the first match
Or:
if match = string.match(/regex/)
# match[1] will contain the first capturing group
If you really want to get all matches in the string, and need to use a capturing group (or feel it's more readable than using lookahead and lookbehind, which it is):
string.scan(/regex/) do |match|
# do something with match[0]
end
Or:
string.scan(/regex/).map(&:first)
Upvotes: 3
Reputation: 921
Its because you are capturing the name in name_line_1 using parentheses. This causes the scan method to return an array of arrays. If you absolutely must return a 1 dimensional array, you can use forward and backward checking like so:
/(?<=\*Hi ).*(?= \*,)/
Or, if you find that too confusing, you could always just call .flatten
on the resulting array ;-)
Upvotes: 3
Reputation: 168199
The difference is that, in the first regex, you have captured substring ()
. When a regex matches, the whole match is captured as $&
, and in addition to that, you can capture parts of it as many as you want by using ()
. They will be captured as $1
, $2
, ...
And scan
behaves differently depending whether you have $1
, $2
, ... When you don't, then it returns an array of all $&
s. When you do have $1
, $2
, ..., then it returns an array of [$1, $2, ...]
.
In order to avoid $1
in the first regex, you have to avoid using captured substring:
Upvotes: 0