bazinga012
bazinga012

Reputation: 753

Multiline Regex in ruby

I want to extract all text between two keywords(<<-DOC, DOC) from a file. For example, if my file content is as below

abc.rb

def abc
    <<-DOC abc:
        return "hahaha"
    DOC
    puts "hahaha"
end

def efg
    <<-DOC efg:
        return "hehehe"
    DOC
    puts "hehehe"
end

I want to get two matches:

<<-DOC abc:
    return "hahaha"
DOC

and

<<-DOC efg:
    return "hehehe"
DOC

I tried File.read("abc.rb").match(/<<-DOC(.*?)DOC/m) but it gives all text between first occurrence of <<-DOC (inside abc) and last occurrence of DOC (inside efg)

Upvotes: 1

Views: 186

Answers (2)

Aleksei Matiushkin
Aleksei Matiushkin

Reputation: 121020

Flip-flop solution:

File.readlines("abc.rb").select do |line|
  true if (line.include? '<<-DOC')...(line.include? 'DOC')
end
#⇒ [
#     [0] "    <<-DOC abc:",
#     [1] "        return \"hahaha\"",
#     [2] "    DOC",
#     [3] "    <<-DOC efg:",
#     [4] "        return \"hehehe\"",
#     [5] "    DOC"
# ]

Upvotes: 2

Kevin Schwerdtfeger
Kevin Schwerdtfeger

Reputation: 590

From what I can tell, your regex is correct and the (.*?) should be a non-greedy match. I think that the issue you are running into is that match in Ruby only returns the first match of the regex. For instance

File.read("abc.rb").match(/<<-DOC(.*?)DOC/m)
=> #<MatchData "<<-DOC abc:\n        return \"hahaha\"\n    DOC" 1:" abc:\n        return \"hahaha\"\n    "> 

What you really want to use is scan

File.read("abc.rb").scan(/<<-DOC(.*?)DOC/m)
=> [[" abc:\n        return \"hahaha\"\n    "], [" efg:\n        return \"hehehe\"\n    "]] 

This will return you an array of arrays, with each array containing the captured groups from the regex. See https://ruby-doc.org/core-2.2.0/String.html#method-i-scan

Upvotes: 2

Related Questions