Reputation: 233
So I'm currently trying to sort values from a file. I'm stuck on the finding the first attribute, and am not sure why. I'm new to regex and ruby so I'm not sure how to go about the problem. I'm trying to find values of a,b,c,d,e where they are all positive numbers.
Here's what the line will look like
length=<a> begin=(<b>,<c>) end=(<d>,<e>)
Here's what I'm using to find the values
current_line = file.gets
if current_line == nil then return end
while current_line = file.gets do
if line =~ /length=<(\d+)> begin=((\d+),(\d+)) end=((\d+),(\d+))/
length, begin_x, begin_y, end_x, end_y = $1, $2, $3, $4, $5
puts("length:" + length.to_s + " begin:" + begin_x.to_s + "," + begin_y.to_s + " end:" + end_x.to_s + "," + end_y.to_s)
end
end
for some reason it never prints anything out, so I'm assuming it never finds a match
Sample input length=4 begin=(0,0) end=(3,0)
A line with 0-4 decimals after 2 integers seperated by commas. So it could be any of these:
2 4 1.3434324,3.543243,4.525324
1 2
18 3.3213,9.3233,1.12231,2.5435
7 9 2.2,1.899990
0 3 2.323
Upvotes: 0
Views: 97
Reputation: 48599
Sample input:
length=4 begin=(0,0) end=(3,0)
data.txt:
length=3 begin=(0,0) end=(3,0)
length=4 begin=(0,1) end=(0,5)
length=2 begin=(1,3) end=(1,5)
Try this:
require 'pp'
Line = Struct.new(
:length,
:begin_x,
:begin_y,
:end_x,
:end_y,
)
lines = []
IO.foreach('data.txt') do |line|
numbers = []
line.scan(/\d+/) do |match|
numbers << match.to_i
end
lines << Line.new(*numbers)
end
pp lines
puts lines[-1].begin_x
--output:--
[#<struct Line length=3, begin_x=0, begin_y=0, end_x=3, end_y=0>,
#<struct Line length=4, begin_x=0, begin_y=1, end_x=0, end_y=5>,
#<struct Line length=2, begin_x=1, begin_y=3, end_x=1, end_y=5>]
1
With this data.txt:
2 4 1.3434324,3.543243,4.525324
1 2
18 3.3213,9.3233,1.12231,2.5435
7 9 2.2,1.899990
0 3 2.323
Try this:
require 'pp'
data = []
IO.foreach('data.txt') do |line|
pieces = line.split
csv_numbers = pieces[-1]
next if not csv_numbers.index('.') #skip the case where there are no floats on a line
floats = csv_numbers.split(',')
data << floats.map(&:to_f)
end
pp data
--output:--
[[1.3434324, 3.543243, 4.525324],
[3.3213, 9.3233, 1.12231, 2.5435],
[2.2, 1.89999],
[2.323]]
Upvotes: 0
Reputation: 110685
Here is your regex:
r = /length=<(\d+)> begin=((\d+),(\d+)) end=((\d+),(\d+))/
str.scan(r)
#=> nil
First, we need to escape the parenthesis:
r = /length=<(\d+)> begin=\((\d+),(\d+)\) end=\((\d+),(\d+)\)/
Next, add the missing <
and >
after "begin"
and "end"
.
r = /length=<(\d+)> begin=\(<(\d+)>,<(\d+)>\) end=\(<(\d+)>,<(\d+)>\)/
Now let's try it:
str = "length=<4779> begin=(<21>,<47>) end=(<356>,<17>)"
but first, let's set the mood
str.scan(r)
#=> [["4779", "21", "47", "356", "17"]]
Success!
Lastly (though probably not necessary), we might replace the single spaces with \s+
, which permits one or more spaces:
r = /length=<(\d+)>\s+begin=\(<(\d+)>,<(\d+)>\)\send=\(<(\d+)>,<(\d+)>\)/
Addendum
The OP has asked how this would be modified if some of the numeric values were floats. I do not understand precisely what has been requested, but the following could be modified as required. I've assumed all the numbers are non-negative. I've also illustrated one way to "build" a regex, using Regexp#new.
s1 = '<(\d+(?:\.\d+)?)>' # note single parens
#=> "<(\\d+(?:\\.\\d+)?)>"
s2 = "=\\(#{s1},#{s1}\\)"
#=> "=\\(<(\\d+(?:\\.\\d+)?)>,<(\\d+(?:\\.\\d+)?)>\\)"
r = Regexp.new("length=#{s1} begin#{s2} end#{s2}")
#=> /length=<(\d+(?:\.\d+)?)> begin=\(<(\d+(?:\.\d+)?)>,<(\d+(?:\.\d+)?)>\) end=\(<(\d+(?:\.\d+)?)>,<(\d+(?:\.\d+)?)>\)/
str = "length=<47.79> begin=(<21>,<4.7>) end=(<0.356>,<17.999>)"
str.scan(r)
#=> [["47.79", "21", "4.7", "0.356", "17.999"]]
Upvotes: 2