Reputation: 405
I need to modify a string in ruby. Specifically I'm trying to remove 'holes' from a WKT string. Holes are defined as any single set of parenthesis after the first one with numbers within. For example in this string...
POLYGON ((1 2, 3 4), (5 6, 7 8))
I would need to remove , (5 6, 7 8)
because this parenthesis data is a hole, and the comma and the space don't belong except to separate sets of parentheses.
I am avoiding ruby methods like match
or scan
to try to optimize for speed and achieve O(n) speed.
Here's what I have so far.
def remove_holes_from(wkt)
output_string = ""
last_3_chars = [ nil, nil, nil ]
number_chars = [ '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' ]
should_delete_chars = false
wkt.each_char do |char|
last_3_chars.shift
last_3_chars.push(char)
if should_delete_chars == false
if number_chars.include?(last_3_chars[0]) && last_3_chars[1] == ")" && last_3_chars[2] == ","
should_delete_chars = true
next
else
output_string += char
end
end
if should_delete_chars == true
if number_chars.include?(last_3_chars[0]) && last_3_chars[1] == ")" && last_3_chars[2] == ")"
should_delete_chars = false
output_string += char
else
next
end
end
end
output_string
end
The problem I am facing is that for a large polygon, like the United States (over 500,000 characters and over 40,000 points) it takes me 66 seconds to complete this. You can find the string here: https://gist.github.com/cheeseandpepper/9da5ca6ade921da2b4ab
Can anyone think of optimizations to this example I can use? Or maybe a separate approach? Thanks.
Upvotes: 1
Views: 111
Reputation: 405
Whelp... regex wins!
wkt.gsub(/, \(-?\d.*?\)/, "")
took me 0.003742 seconds
As for the regex
,
Literal comma
Literal space
\(
Literal open parenthesis
-?
Optional negative sign
\d
Any digit (because the previous is optional, we need to make sure we have a digit vs another open parenthesis)
.*
Any number of any characters (will be digits, a comma and maybe a negative sign)
?\)
Up to and including a literal close parenthesis
Upvotes: 2