Reputation: 753
I want to extract all text between two keywords(<<-DOC, DOC) from a file. For example, if my file content is as below
abc.rb
def abc
<<-DOC abc:
return "hahaha"
DOC
puts "hahaha"
end
def efg
<<-DOC efg:
return "hehehe"
DOC
puts "hehehe"
end
I want to get two matches:
<<-DOC abc:
return "hahaha"
DOC
and
<<-DOC efg:
return "hehehe"
DOC
I tried File.read("abc.rb").match(/<<-DOC(.*?)DOC/m)
but it gives all text between first occurrence of <<-DOC
(inside abc) and last occurrence of DOC
(inside efg)
Upvotes: 1
Views: 186
Reputation: 121020
Flip-flop solution:
File.readlines("abc.rb").select do |line|
true if (line.include? '<<-DOC')...(line.include? 'DOC')
end
#⇒ [
# [0] " <<-DOC abc:",
# [1] " return \"hahaha\"",
# [2] " DOC",
# [3] " <<-DOC efg:",
# [4] " return \"hehehe\"",
# [5] " DOC"
# ]
Upvotes: 2
Reputation: 590
From what I can tell, your regex is correct and the (.*?) should be a non-greedy match. I think that the issue you are running into is that match
in Ruby only returns the first match of the regex. For instance
File.read("abc.rb").match(/<<-DOC(.*?)DOC/m)
=> #<MatchData "<<-DOC abc:\n return \"hahaha\"\n DOC" 1:" abc:\n return \"hahaha\"\n ">
What you really want to use is scan
File.read("abc.rb").scan(/<<-DOC(.*?)DOC/m)
=> [[" abc:\n return \"hahaha\"\n "], [" efg:\n return \"hehehe\"\n "]]
This will return you an array of arrays, with each array containing the captured groups from the regex. See https://ruby-doc.org/core-2.2.0/String.html#method-i-scan
Upvotes: 2