tahwos
tahwos

Reputation: 574

Grouping Regex in Ruby

Sample Text

    outline: 4 0
      corner: 1 347980000 -2540000 0
      corner: 2 347980000 -20320000 0
      corner: 3 482600000 -20320000 0
      corner: 4 482600000 -2540000 0

    outline: 4 1
      corner: 1 0 -2540000 0
      corner: 2 345440000 -2540000 0
      corner: 3 345440000 -20320000 0
      corner: 4 0 -20320000 0

    outline: 8 2
      corner: 1 0 0 0
      corner: 2 0 35560000 0
      corner: 3 53340000 35560000 0
      corner: 4 53340000 76200000 0
      corner: 5 449580000 76200000 0
      corner: 6 449580000 30226000 0
      corner: 7 482600000 30226000 0
      corner: 8 482600000 0 0

    outline: 4 3
      corner: 1 0 38100000 0
      corner: 2 50800000 38100000 0
      corner: 3 50800000 76200000 0
      corner: 4 0 76200000 0

    outline: 4 4
      corner: 1 482600000 76200000 0
      corner: 2 482854000 31750000 0
      corner: 3 450850000 31750000 0
      corner: 4 450850000 76200000 0

/^\s+corner:\s*(\d+)\s+(-?\d+)\s+(-?\d+)\s+(\d+)/m Captures all values for corners.

/^\s*outline:\s*(\d+)\s+(\d+)$.*?\s+corner:\s*(\d+)\s+(-?\d+)\s+(-?\d+)\s+(\d+)/m Captures all outlines, but only the first corner of each outline.

/^\s*outline:\s*(\d+)\s+(\d+)$.*?(^\s+corner:\s*(\d+)\s+(-?\d+)\s+(-?\d+)\s+(\d+)$).*?/m Does the same thing as the second, but looks like this:

4
0
corner: 1 347980000 -2540000 0
1
347980000
-2540000
0

I am trying to get it to capture all of the outlines and related corners. It's obviously not properly grouped - Any suggestions?

Thank you ;-)

Upvotes: 0

Views: 160

Answers (2)

sawa
sawa

Reputation: 168091

Since the number of captures you want varies (probably without limit), you cannot do that in one regex. String#scan comes in handy in such case.

text.scan(/^\s*outline:\s*(\d+)\s+(\d+)\n(.*?)(?:\n\n|\z)/m)
.map{|a, b, corners| [a, b, corners.scan(/^\s+corner:\s*(\d+)\s+(-?\d+)\s+(-?\d+)\s+(\d+)/)]}

will give you:

[["4", "0",
  [["1", "347980000", "-2540000", "0"],
   ["2", "347980000", "-20320000", "0"],
   ["3", "482600000", "-20320000", "0"],
   ["4", "482600000", "-2540000", "0"]]],
 ["4", "1",
  [["1", "0", "-2540000", "0"],
   ["2", "345440000", "-2540000", "0"],
   ["3", "345440000", "-20320000", "0"],
   ["4", "0", "-20320000", "0"]]],
 ["8", "2",
  [["1", "0", "0", "0"],
   ["2", "0", "35560000", "0"],
   ["3", "53340000", "35560000", "0"],
   ["4", "53340000", "76200000", "0"],
   ["5", "449580000", "76200000", "0"],
   ["6", "449580000", "30226000", "0"],
   ["7", "482600000", "30226000", "0"],
   ["8", "482600000", "0", "0"]]],
 ["4", "3",
  [["1", "0", "38100000", "0"],
   ["2", "50800000", "38100000", "0"],
   ["3", "50800000", "76200000", "0"],
   ["4", "0", "76200000", "0"]]],
["4", "4",
  [["1", "482600000", "76200000", "0"],
   ["2", "482854000", "31750000", "0"],
   ["3", "450850000", "31750000", "0"],
   ["4", "450850000", "76200000", "0"]]]]

If you want numbers instead of strings,

text.scan(/^\s*outline:\s*(\d+)\s+(\d+)\n(.*?)(?:\n\n|\z)/m)
.map{|a, b, corners| [a.to_i, b.to_i, corners.scan(/^\s+corner:\s*(\d+)\s+(-?\d+)\s+(-?\d+)\s+(\d+)/).map{|a| a.map(&:to_i)}]}

will give you:

[[4, 0,
  [[1, 347980000, -2540000, 0],
   [2, 347980000, -20320000, 0],
   [3, 482600000, -20320000, 0],
   [4, 482600000, -2540000, 0]]],
 [4, 1,
  [[1, 0, -2540000, 0],
   [2, 345440000, -2540000, 0],
   [3, 345440000, -20320000, 0],
   [4, 0, -20320000, 0]]],
 [8, 2,
  [[1, 0, 0, 0],
   [2, 0, 35560000, 0],
   [3, 53340000, 35560000, 0],
   [4, 53340000, 76200000, 0],
   [5, 449580000, 76200000, 0],
   [6, 449580000, 30226000, 0],
   [7, 482600000, 30226000, 0],
   [8, 482600000, 0, 0]]],
 [4, 3,
  [[1, 0, 38100000, 0],
   [2, 50800000, 38100000, 0],
   [3, 50800000, 76200000, 0],
   [4, 0, 76200000, 0]]],
[4, 4,
  [[1, 482600000, 76200000, 0],
   [2, 482854000, 31750000, 0],
   [3, 450850000, 31750000, 0],
   [4, 450850000, 76200000, 0]]]]

Upvotes: 1

DigitalRoss
DigitalRoss

Reputation: 146043

I'm not sure I even want to know what the purpose of scanning that file with a regex is.

But you know, it would be easy to parse using virtually any technique other than regular expressions.

And in fact, with just a slight change in syntax1 it's a good YAML file:

- outline: 4 0
  - corner: 1 347980000 -2540000 0
  - corner: 2 347980000 -20320000 0
  - corner: 3 482600000 -20320000 0
  - corner: 4 482600000 -2540000 0

- outline: 4 1
. . .
. . .

And there you go, a perfectly organized data structure with one line of Ruby:

 > pp YAML::load_file 'corners.yaml'
[{"outline"=>
   [{"corner"=>"1 347980000 -2540000 0"},
    {"corner"=>"2 347980000 -20320000 0"},
    {"corner"=>"3 482600000 -20320000 0"},
    {"corner"=>"4 482600000 -2540000 0"}]},
 {"outline"=>
   [{"corner"=>"1 0 -2540000 0"},
    {"corner"=>"2 345440000 -2540000 0"},
    {"corner"=>"3 345440000 -20320000 0"},
    {"corner"=>"4 0 -20320000 0"}]},
 {"outline"=>
   [{"corner"=>"1 0 0 0"},
    {"corner"=>"2 0 35560000 0"},
    {"corner"=>"3 53340000 35560000 0"},
    {"corner"=>"4 53340000 76200000 0"},
    {"corner"=>"5 449580000 76200000 0"},
    {"corner"=>"6 449580000 30226000 0"},
    {"corner"=>"7 482600000 30226000 0"},
    {"corner"=>"8 482600000 0 0"}]},
 {"outline"=>
   [{"corner"=>"1 0 38100000 0"},
    {"corner"=>"2 50800000 38100000 0"},
    {"corner"=>"3 50800000 76200000 0"},
    {"corner"=>"4 0 76200000 0"}]},
 {"outline"=>
   [{"corner"=>"1 482600000 76200000 0"},
    {"corner"=>"2 482854000 31750000 0"},
    {"corner"=>"3 450850000 31750000 0"},
    {"corner"=>"4 450850000 76200000 0"}]}]

1. Now I did use a vim(1) regex to convert the file :%s/^ */&- /

Upvotes: 0

Related Questions