Marc Fletcher
Marc Fletcher

Reputation: 1082

Ruby Regex consecutive unique characters

Given the string

aabbaacceeeeeaa

I am trying to devise a regex that will capture substrings that contain three unique characters of any quantity.

["aabbaacc", "bbaacc", "aacceeeeaa" "cceeeeaa"]. 

I've tried something like

/[(\w)\1+]/ or /[(\w)(?!\1)]/

I know those are incomplete. Im not sure If I am on the right track or not.

but I am not sure how to exclude already matched characters, or at least I cannot seem to use ?! properly.

Upvotes: 2

Views: 394

Answers (2)

Cary Swoveland
Cary Swoveland

Reputation: 110725

Best of luck with the regex, but if you need a backup plan,....

def pull_subs(str, n)
  arr = str.chars
  (n..str.size).each_with_object([]) { |i,a| arr.each_cons(i) { |b|
    a << b.join if b.uniq.size == n } }
end

str = "aabbaacceeeeeaa"

pull_subs(str, 3)
  #=> ["baac", "acce", "bbaac", "baacc", "aacce", "accee", "abbaac", "bbaacc",
  #    "aaccee", "acceee", "aabbaac", "abbaacc", "aacceee", "acceeee", "ceeeeea",
  #    "aabbaacc", "aacceeee", "acceeeee", "cceeeeea", "ceeeeeaa", "aacceeeee",
  #    "acceeeeea", "cceeeeeaa", "aacceeeeea", "acceeeeeaa", "aacceeeeeaa"] 
pull_subs(str, 2)
  #=> ["ab", "ba", "ac", "ce", "ea", "aab", "abb", "bba", "baa", "aac", "acc",
  #    "cce", "cee", "eea", "eaa", "aabb", "abba", "bbaa", "aacc", "ccee",
  #    "ceee", "eeea", "eeaa", "aabba", "abbaa", "cceee", "ceeee", "eeeea", 
  #    "eeeaa", "aabbaa", "cceeee", "ceeeee", "eeeeea", "eeeeaa", "cceeeee", 
  #    "eeeeeaa"] 
pull_subs(str, 4)
  #=> ["baacce", "bbaacce", "baaccee", "abbaacce", "bbaaccee", "baacceee", 
  #    "aabbaacce", "abbaaccee", "bbaacceee", "baacceeee", "aabbaaccee", 
  #    "abbaacceee", "bbaacceeee", "baacceeeee", "aabbaacceee", "abbaacceeee",
  #    "bbaacceeeee", "baacceeeeea", "aabbaacceeee", "abbaacceeeee",
  #    "bbaacceeeeea", "baacceeeeeaa", "aabbaacceeeee", "abbaacceeeeea", 
  #    "bbaacceeeeeaa", "aabbaacceeeeea", "abbaacceeeeeaa", "aabbaacceeeeeaa"] 

Upvotes: 3

sawa
sawa

Reputation: 168199

It is impossible to do it with scan because the expected substrings overlap. The best way to do it is using indices.

It is difficult using a regex to exclude matches that start in the middle of consecutive identical letters.

s = "aabbaacceeeeeaa"

(1..s.length).map do
  |i|
  (s[i] != s[i + 1] || nil) &&
  /(.)\1*+(.)(?:\1|\2)*+(.)(?:\1|\2|\3)*/.match(s, i - 1)&.[](0)
end.compact
# => ["aabbaacc", "bbaacc", "aacceeeeeaa", "cceeeeeaa"]

Upvotes: 3

Related Questions