Jwosty
Jwosty

Reputation: 3644

Ruby String#scan equivalent to return MatchData

As basically stated in the question title, is there a method on Ruby strings that is the equivalent to String#Scan but instead of returning just a list of each match, it would return an array of MatchDatas? For example:

# Matches a set of characters between underscore pairs
"foo _bar_ _baz_ hashbang".some_method(/_[^_]+_/) #=> [#&ltMatchData "_bar_"&rt, &ltMatchData "_baz_"&rt]

Or any way I could get the same or similar result would be good. I would like to do this to find the positions and extents of "strings" within Ruby strings, e.g. "goodbye and "world" inside "'goodbye' cruel 'world'".

Upvotes: 13

Views: 2134

Answers (3)

mu is too short
mu is too short

Reputation: 434745

You could easily build your own by exploiting MatchData#end and the pos parameter of String#match. Something like this:

def matches(s, re)
  start_at = 0
  matches  = [ ]
  while(m = s.match(re, start_at))
    matches.push(m)
    start_at = m.end(0)
  end
  matches
end

And then:

>> matches("foo _bar_ _baz_ hashbang", /_[^_]+_/)
=> [#<MatchData "_bar_">, #<MatchData "_baz_">]
>> matches("_a_b_c_", /_[^_]+_/)
=> [#<MatchData "_a_">, #<MatchData "_c_">]
>> matches("_a_b_c_", /_([^_]+)_/)
=> [#<MatchData "_a_" 1:"a">, #<MatchData "_c_" 1:"c">]
>> matches("pancakes", /_[^_]+_/)
=> []

You could monkey patch that into String if you really wanted to.

Upvotes: 9

Kelvin
Kelvin

Reputation: 20912

If you don't need to get MatchDatas back, here's a way using StringScanner.

require 'strscan'

rxp = /_[^_]+_/
scanner = StringScanner.new "foo _barrrr_ _baz_ hashbang"
match_infos = []
until scanner.eos?
  scanner.scan_until rxp
  if scanner.matched?
    match_infos << {
      pos: scanner.pre_match.size,
      length: scanner.matched_size,
      match: scanner.matched
    }
  else
    break
  end
end

p match_infos
# [{:pos=>4, :length=>8, :match=>"_barrrr_"}, {:pos=>13, :length=>5, :match=>"_baz_"}]

Upvotes: 1

Nash Bridges
Nash Bridges

Reputation: 2378

memo = []
"foo _bar_ _baz_ hashbang".scan(/_[^_]+_/) { memo << Regexp.last_match }
 => "foo _bar_ _baz_ hashbang"
memo
 => [#<MatchData "_bar_">, #<MatchData "_baz_">]

Upvotes: 14

Related Questions