Konstantin
Konstantin

Reputation: 3123

How to extract repeating character sequences from a string with Ruby regex?

I have such a string "++++001------zx.......?????????xxxxxxx" I would like to extract the more than one length continuous sequences into a flattened array with a Ruby regex:

["++++",
"00",
"------",
".......",
"?????????",
"xxxxxxx"]

I can achieve this with a nested loop:

s="++++001------zx.......?????????xxxxxxx"
t=s.split(//)
i=0
f=[]
while i<=t.length-1 do
  j=i
  part=""
  while t[i]==t[j] do
    part=part+t[j]
    j=j+1
  end
  i=j
  if part.length>=2 then f.push(part) end
end

But I am unable to find an appropriate regex to feed into the scan method. I tried this: s.scan(/(.)\1++/x) but it only captures the first character of the repeating sequences. Is it possible at all?

Upvotes: 1

Views: 2258

Answers (3)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626748

In case you need to get overall match values only while ignoring (omitting) all capturing group values, similarly to how String#match works in JavaScript, you can use a String#gsub with a single regex argument (no replacement argument) to return an Enumerator, with .to_a to get the array of matches:

text = "++++001------zx.......?????????xxxxxxx" 
p text.gsub(/(.)\1+/m).to_a
# => ["++++", "00", "------", ".......", "?????????", "xxxxxxx"]

See the Ruby demo online and the Rubular demo (note how the matches are highlighted in the Match result field).

I added m modifier just for completeness, for the . to also match line break chars that a . does not match by default.

Also, see a related Capturing groups don't work as expected with Ruby scan method thread.

Upvotes: 0

Rob Wagner
Rob Wagner

Reputation: 4421

This is a bit tricky.

You do want to capture any group that is more than one of any given character. So a good way to do this is using backreferences. Your solution is close to being correct.

/((.)\2+)/ should do the trick.

Note that if you use scan, this will return two values for each match group. The first being the sequence, and the second being the value.

Upvotes: 3

Arup Rakshit
Arup Rakshit

Reputation: 118261

str =  "++++001------zx.......?????????xxxxxxx" 
str.chars.chunk{|e| e}.map{|e| e[1].join if e[1].size >1 }.compact
# => ["++++", "00", "------", ".......", "?????????", "xxxxxxx"]

Upvotes: 1

Related Questions