I Woo
I Woo

Reputation: 21

Best way to capture multiple matches

Having in same text message fixed part once (id of item) and multiple lines (several references and dimensions of each part):

..some random text here..
ID/11000082734
REF/D14-109-0
REF/D14-209-0
REF/D14-219-0
CMT/59-40-25
CMT/38-25-28
CMT/59-40-25
CMT/37-37-20
CMT/40-40-20
CMT/37-37-20
CMT/49-41-31
CMT/44-34-53

I want to parse and store IdCode, References, Array with dimensions.

When applying REGEX.match(my_text) method getting only first occurencies of REF and CMT:

REGEX = %r{
ID\/(?<IdCode> \d{10})\s 
(REF\/(?<ReferenceCode> \w{3}\-\d{3}\-\d)\s)+ 
(CMT\/(?<Length> \d+)\-(?<Width> \d+)\-(?<Height> \d+)\s)+
}x

The result looks like this:

IdCode: "1100008273"
ReferenceCode:  "D14-219-0"
Length: "37"
Width:  "37"
Height: "20"

Is there a way to capture multiple occurrences without iterating ?

Upvotes: 2

Views: 111

Answers (2)

Cary Swoveland
Cary Swoveland

Reputation: 110665

Suppose your string were:

str = %w| dog
          ID/11000082734
          REF/D14-109-0
          REF/D14-209-0
          CMT/49-41-31
          CMT/44-34-53
          cat
          ID/11000082735
          REF/D14-109-1
          REF/D14-209-1
          CMT/49-41-32
          CMT/44-34-54
          pig |.join("\n")

  #=> "dog\nID/11000082734\nREF/D14-109-0\nREF/D14-209-0\nCMT/49-41-31\nCMT/44-34-53\ncat\nID/11000082735\nREF/D14-109-1\nREF/D14-209-1\nCMT/49-41-32\nCMT/44-34-54\npig"

Then you could write:

r = /(ID\/\d{11})                     # match string in capture group 1
    \n                                # match newline
    ((?:REF\/[A-Z]\d{2}-\d{3}-\d\n)+) # match consecutive REF lines in capture group 2
    ((?:CMT\/\d{2}-\d{2}-\d{2}\n)+)   # match consecutive CMT lines in capture group 3
    /x                                # free-spacing regex definition mode 

arr = str.scan(r)
  #=> [["ID/11000082734", "REF/D14-109-0\nREF/D14-209-0\n",
  #     "CMT/49-41-31\nCMT/44-34-53\n"],
  #    ["ID/11000082735", "REF/D14-109-1\nREF/D14-209-1\n",
  #     "CMT/49-41-32\nCMT/44-34-54\n"]]

This extracts the desired information without iterating.

At this point it may be desirable to convert arr to a more convenient data structure. For example:

arr.map do |a,b,c| 
  { :id  => a[/\d+/],
    :ref => b.split("\n").map { |s| s[4..-1] },
    :cmt => c.scan(/(\d{2})-(\d{2})-(\d{2})/).map { |e|
              [:length, :width, :height].zip(e.map(&:to_i)).to_h }
  }
end
  #=> [{ :id=>"11000082734",
  #      :ref=>["D14-109-0", "D14-209-0"],
  #      :cmt=>[{ :length=>49, :width=>41, :height=>31 },
  #             { :length=>44, :width=>34, :height=>53 }
  #            ]
  #    },
  #    { :id=>"11000082735",
  #      :ref=>["D14-109-1", "D14-209-1"],
  #      :cmt=>[{ :length=>49, :width=>41, :height=>32 },
  #             { :length=>44, :width=>34, :height=>54 }
  #            ]
  #    }
  #   ] 

Upvotes: 1

Tim007
Tim007

Reputation: 2557

Try this

(?<IdCode>\d{10,})|REF\/(?<ReferenceCode>\w{3}\-\d{3}\-\d)|CMT\/(?<Length>\d+)\-(?<Width>\d+)\-(?<Height>\d+)

Regex demo

Explanation:
( … ): Capturing group sample
?: Once or none sample
\: Escapes a special character sample
|: Alternation / OR operand sample
+: One or more sample

Input

..some random text here..
ID/11000082734
REF/D14-109-0
REF/D14-209-0
REF/D14-219-0
CMT/59-40-25
CMT/38-25-28
CMT/59-40-25
CMT/37-37-20
CMT/40-40-20
CMT/37-37-20
CMT/49-41-31
CMT/44-34-53

Output:

MATCH 1
IdCode  [29-40] `11000082734`
MATCH 2
ReferenceCode   [45-54] `D14-109-0`
MATCH 3
ReferenceCode   [59-68] `D14-209-0`
MATCH 4
ReferenceCode   [73-82] `D14-219-0`
MATCH 5
Length  [87-89] `59`
Width   [90-92] `40`
Height  [93-95] `25`
MATCH 6
Length  [100-102]   `38`
Width   [103-105]   `25`
Height  [106-108]   `28`
MATCH 7
Length  [113-115]   `59`
Width   [116-118]   `40`
Height  [119-121]   `25`
MATCH 8
Length  [126-128]   `37`
Width   [129-131]   `37`
Height  [132-134]   `20`
MATCH 9
Length  [139-141]   `40`
Width   [142-144]   `40`
Height  [145-147]   `20`
MATCH 10
Length  [152-154]   `37`
Width   [155-157]   `37`
Height  [158-160]   `20`
MATCH 11
Length  [165-167]   `49`
Width   [168-170]   `41`
Height  [171-173]   `31`
MATCH 12
Length  [178-180]   `44`
Width   [181-183]   `34`
Height  [184-186]   `53`

Upvotes: 0

Related Questions