cheeseandpepper
cheeseandpepper

Reputation: 405

What is the fastest way to modify a large string in Ruby?

I need to modify a string in ruby. Specifically I'm trying to remove 'holes' from a WKT string. Holes are defined as any single set of parenthesis after the first one with numbers within. For example in this string... POLYGON ((1 2, 3 4), (5 6, 7 8))

I would need to remove , (5 6, 7 8) because this parenthesis data is a hole, and the comma and the space don't belong except to separate sets of parentheses.

I am avoiding ruby methods like match or scan to try to optimize for speed and achieve O(n) speed.

Here's what I have so far.

def remove_holes_from(wkt)
    output_string = ""
    last_3_chars  = [ nil, nil, nil ]
    number_chars  = [ '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' ]

    should_delete_chars = false

    wkt.each_char do |char|
      last_3_chars.shift
      last_3_chars.push(char)
      if should_delete_chars == false
        if number_chars.include?(last_3_chars[0]) && last_3_chars[1] == ")" && last_3_chars[2] == ","
          should_delete_chars = true
          next
        else
          output_string += char
        end
      end

      if should_delete_chars == true
        if number_chars.include?(last_3_chars[0]) && last_3_chars[1] == ")" && last_3_chars[2] == ")"
          should_delete_chars = false
          output_string += char
        else
          next
        end
      end
    end
    output_string
  end

The problem I am facing is that for a large polygon, like the United States (over 500,000 characters and over 40,000 points) it takes me 66 seconds to complete this. You can find the string here: https://gist.github.com/cheeseandpepper/9da5ca6ade921da2b4ab

Can anyone think of optimizations to this example I can use? Or maybe a separate approach? Thanks.

Upvotes: 1

Views: 111

Answers (1)

cheeseandpepper
cheeseandpepper

Reputation: 405

Whelp... regex wins!

wkt.gsub(/, \(-?\d.*?\)/, "")

took me 0.003742 seconds

As for the regex

, Literal comma

Literal space

\( Literal open parenthesis

-? Optional negative sign

\d Any digit (because the previous is optional, we need to make sure we have a digit vs another open parenthesis)

.* Any number of any characters (will be digits, a comma and maybe a negative sign)

?\) Up to and including a literal close parenthesis

Upvotes: 2

Related Questions