V K Singh
V K Singh

Reputation: 1254

Smallest substring consisting of maximum distinct characters of a String

Given a string, I need to find the smallest substring that contains all unique characters in the string. Here are three examples:

Input: "AABBBCBB"    Shortest substring: "ABBBC"    
Input: "AABBBCBBAC", Shortest substring: "BAC"        
Input: "aabcaadcc",  Shortest substring: "bcaad"

The unique characters in the first substring are 'A', 'B' and 'C'. The substrings that contain those characters are 'AABBBC', 'AABBBCB', 'AABBBCBB', 'ABBBC', 'ABBBCB' and 'ABBBCBB'. The shortest of these is 'ABBBC'. If there are two or more shortest substrings any one can be returned.

Upvotes: 0

Views: 2235

Answers (2)

Cary Swoveland
Cary Swoveland

Reputation: 110675

Code

def doit(str)
  uniq_chars = str.each_char.uniq
  nbr_uniq_chars = uniq_chars.size
  last_idx = str.size - 1
  shortest = { idx: 0, size: str.size }
  last_start_idx = last_idx - nbr_uniq_chars + 1
  (0..last_start_idx).each do |start_idx|
    first_end_idx = start_idx + nbr_uniq_chars - 1
    last_end_idx = start_idx + shortest[:size] - 1
    (first_end_idx..last_end_idx).each do |end_idx|
      if (uniq_chars - str[start_idx..end_idx].chars).empty?
        shortest = { idx: start_idx,
                     size: end_idx - start_idx + 1 }
        break
      end
    end
  end
  str[shortest[:idx], shortest[:size]]
end

Examples

doit "AABBBCBB"   #=> "ABBBC" 
doit "AABBBCBBAC" #=> "BAC" 
doit "aabcaadcc"  #=> "bcaad"

Explanation

Suppose:

str = "AABBBCBB"

The steps are as follows.

uniq_chars = str.each_char.uniq
  #=> ["A", "B", "C"] 
nbr_uniq_chars = uniq_chars.size
  #=> 3 
last_idx = str.size - 1
  #=> 7 
shortest = { idx: 0, size: str.size }
  #=> {:idx=>0, :size=>8} 

shortest describes the shortest subtring found so far. It is the substring

str[shortest[:idx], shortest[:size]]

Initially it describes the entire string. Continuing,

last_start_idx = last_idx - nbr_uniq_chars + 1
  #=> 5

I will be fixing the starting index, start_idx, initially at zero, and then consider all substrings that begin with that index. There is no reason to consider start_idx > last_idx as in that case str[start_idx..-1].size < nbr_uniq_chars, and therefore is not a possibility.

enum1 = (0..last_start_idx).each
  #=> #<Enumerator: 0..5:each> 
start_idx = enum1.next
  #=> 0 
first_end_idx = start_idx + nbr_uniq_chars - 1
  #=> 3 
last_end_idx = start_idx + shortest[:size] - 1
  #=> 7

enum2 = (first_end_idx..last_end_idx).each
  #=> #<Enumerator: 3..4:each> 
end_idx = enum2.next
  #=> 2 
a = str[start_idx..end_idx].chars
  #=> str[0..2].chars
  #=> ["A", "A", "B"] 
b = uniq_chars - a
  #=> ["A", "B", "C"] - ["A", "A", "B"] 
  #=> ["C"]
b.empty?
  #=> false, so do not update `shortest`

end_idx = enum2.next
  #=> 3 
a = str[start_idx..end_idx].chars
  #=> ["A", "A", "B", "B"] 
b = uniq_chars - a
  #=> ["C"] 
b.empty?
  #=> false, so do not update `shortest`

end_idx = enum2.next
  #=> 4 
a = str[start_idx..end_idx].chars
  #=> ["A", "A", "B", "B", "B"] 
b = uniq_chars - a
  #=> ["C"] 
b.empty?
  #=> false, so do not update `shortest`

end_idx = enum2.next
  #=> 5 
a = str[start_idx..end_idx].chars
  #=> ["A", "A", "B", "B", "B", "C"] 
b = uniq_chars - a
  #=> [] 
b.empty?
  #=> true

We now have found a substring that contains all unique characters in the string, but more than that we know that it is shorter than the previous shortest substring. (No need to test!) Therefore, we update shortest:

shortest = { idx: start_idx, size: end_idx - start_idx + 1 }
  #=> {:idx=>0, :size=>6}

which describes the substring:

str[shortest[:idx], shortest[:size]]
  #=> "AABBBC"

We no longer need to execute end_idx = enum2.next for this value of start_idx because we know that the associated substring would begin with the string just identified, and therefore would have all unique characters in str but would be longer than the substring just found. We therefore execute:

break

terminating the inner loop. The next step is to generate the second element of enum1 and go from there:

start_idx = enum1.next
  #=> 1 
first_end_idx = start_idx + nbr_uniq_chars - 1
  #=> 3 
last_end_idx = start_idx + shortest[:size] - 1
  #=> 6

This will result in shortest being updated (for the last time) to:

shortest
  #=> { idx: 1, size: 5 }

The remaining calculations are similar.

Upvotes: 1

V K Singh
V K Singh

Reputation: 1254

I have solved this algoritm with below approach.

def max_distinct_char(str)
  str.chars.uniq.count
end

def smallest_subset_str(str)
  str_length = str.length
  max_distinct = max_distinct_char(str)

  min_str_len = str_length

  for j in (0..str_length)
    for k in (0..str_length)
      sub_str = str[j..k]
      sub_str_length = sub_str.length
      sub_distinct_char_length = max_distinct_char(sub_str)

      if (sub_str_length < min_str_len && max_distinct == sub_distinct_char_length)
        min_str_len = sub_str_length
        sub_string = sub_str
      end
    end
  end
  sub_string
end

Using the above methods, we can get results as

smallest_subset_str "AABBBCBB"   #=> "ABBBC" 
smallest_subset_str "AABBBCBBAC" #=> "BAC" 
smallest_subset_str "aabcaadcc"  #=> "bcaad"

Upvotes: 0

Related Questions