Ian Lin
Ian Lin

Reputation: 384

Ruby how to remove repeated regex in string

For a string like

s = "(string1) this is text (string2) that's separated (string3)"

I need a way to remove all the parenthesis and text in them, however if I use the following it'll return an empty string

s.gsub(/\(.*\)/, "")

What can I use to get the following?

" this is text  that's separated "

Upvotes: 0

Views: 126

Answers (3)

Cary Swoveland
Cary Swoveland

Reputation: 110755

You could do the following:

s.gsub(/\(.*?\)/,'')
  # => " this is text  that's separated "

The ? in the regex is to make it "non-greedy". Without it, if:

s = "A: (string1) this is text (string2) that's separated (string3) B"

then

s.gsub(/\(.*\)/,'')
  #=> "A:  B" 

Edit: I ran the following benchmarks for various methods. You will see that there is one important take-away.

n = 10_000_000
s = "(string1) this is text (string2) that's separated (string3)"

Benchmark.bm do |bm|
  bm.report 'sawa' do
    n.times { s.gsub(/\([^()]*\)/,'') }
  end 
  bm.report 'cary' do
    n.times { s.gsub(/\(.*?\)/,'') }
  end 
  bm.report 'cary1' do
    n.times { s.split(/\(.*?\)/).join }
  end 
  bm.report 'sawa1' do
    n.times { s.split(/\([^()]*\)/).join }
  end 
  bm.report 'sawa!' do
    n.times { s.gsub!(/\([^()]*\)/,'') }
  end
  bm.report '' do
    n.times { s.gsub(/\([\w\s]*\)/, '') }
  end
end

              user     system      total        real
sawa        37.110000   0.070000  37.180000 ( 37.182598)
cary        37.000000   0.060000  37.060000 ( 37.066398)
cary1       35.960000   0.050000  36.010000 ( 36.009534)
sawa1       36.450000   0.050000  36.500000 ( 36.503711)
sawa!        7.630000   0.000000   7.630000 (  7.632278)
user1179871 38.500000   0.150000  38.650000 ( 38.666955)

I ran the benchmark several times and the results varied a fair bit. In some cases sawa was slightly faster than cary.

[Edit: I added a modified version of @user1179871's method to the benchmark above, but did not change any of the text of my answer. The modification is described in a comment on @user1179871's answer. It looks to be slightly slower that sawa and cary, but that may not be the case, as the benchmark times vary from run-to-run, and I did a separate benchmark of the new method.

Upvotes: 4

user1179871
user1179871

Reputation: 66

Try it

string.gsub(/\({1}\w*\){1}/, '')

Upvotes: 0

sawa
sawa

Reputation: 168269

Cary's answer is the simple way. This answer is the efficient way.

s.gsub(/\([^()]*\)/, "")

To keep in mind: Non-greedy matching requires backtracking, and in general, it is better not use it if you can. But for such simple task, Cary's answer is good enough.

Upvotes: 2

Related Questions