Reputation: 666

Two strings evaluated by regex, but one of the scan results are being put into an extra array?

I can't figure out what I'm doing different in the below example. I have two string which in my perspective are similar - plain strings. For each string I have a regex, but the first regex, /\*Hi (.*) \*,/, gives me a result where the regex match is presented in 2 arrays: [["result"]]. I need my result to be presented in just 1 array: ["result"]. What am I doing differently in the 2 below examples?

✗ irb
2.0.0p247 :001 > name_line_1 = "*Hi Peter Parker *,"
 => "*Hi Peter Parker *," 
2.0.0p247 :002 > name_line_1.scan(/\*Hi (.*) \*,/)
 => [["Peter Parker"]] 
2.0.0p247 :003 > name_line_2 = "Peter Parker<br />Memory Lane 60<br />0000 Gotham<br />USA<br />TEL:: 00000000000<br /><a href=\"mailto:[email protected]\">[email protected]</a><br />\r"
 => "Peter Parker<br />Memory Lane 60<br />0000 Gotham<br />USA<br />TEL:: 00000000000<br /><a href=\"mailto:[email protected]\">[email protected]</a><br />\r" 
2.0.0p247 :004 > name_line_2.scan(/^[^<]*/)
 => ["Peter Parker"]

Upvotes: 2

Answers (3)

Alex D

Reputation: 30455

scan returns an array of matches. As the other answers point out, if your regex has capturing groups (parentheses), that means each match will return an array, with one string for each capturing group within the match.

If it didn't do this, scan wouldn't be very useful, as it is very common to use capturing groups in a regex to pick out different parts of the match.

I suspect that scan is not really the best method for your situation. scan is useful when you want to get all the matches from a string. But in the string you show, there is only one match anyways. If you want to get a specific capturing group from the first match in a string, the easiest way is:

 string[/regex/, 1] # extract the first capturing group, or nil if there is no match

Another way is to do something like this:

 if string =~ /regex/
   # $1 will contain the first capturing group from the first match

Or:

 if match = string.match(/regex/)
   # match[1] will contain the first capturing group

If you really want to get all matches in the string, and need to use a capturing group (or feel it's more readable than using lookahead and lookbehind, which it is):

 string.scan(/regex/) do |match|
   # do something with match[0]
 end

Or:

 string.scan(/regex/).map(&:first)

Upvotes: 3

Doydle

Reputation: 921

Its because you are capturing the name in name_line_1 using parentheses. This causes the scan method to return an array of arrays. If you absolutely must return a 1 dimensional array, you can use forward and backward checking like so:

/(?<=\*Hi ).*(?= \*,)/

Or, if you find that too confusing, you could always just call .flatten on the resulting array ;-)

Upvotes: 3

sawa

Reputation: 168199

The difference is that, in the first regex, you have captured substring (). When a regex matches, the whole match is captured as $&, and in addition to that, you can capture parts of it as many as you want by using (). They will be captured as $1, $2, ...

And scan behaves differently depending whether you have $1, $2, ... When you don't, then it returns an array of all $&s. When you do have $1, $2, ..., then it returns an array of [$1, $2, ...].

In order to avoid $1 in the first regex, you have to avoid using captured substring:

Upvotes: 0

Two strings evaluated by regex, but one of the scan results are being put into an extra array?

Answers (3)

Related Questions