Reputation: 1254
Given a string, I need to find the smallest substring that contains all unique characters in the string. Here are three examples:
Input: "AABBBCBB" Shortest substring: "ABBBC"
Input: "AABBBCBBAC", Shortest substring: "BAC"
Input: "aabcaadcc", Shortest substring: "bcaad"
The unique characters in the first substring are 'A'
, 'B'
and 'C'
. The substrings that contain those characters are 'AABBBC'
, 'AABBBCB'
, 'AABBBCBB'
, 'ABBBC'
, 'ABBBCB'
and 'ABBBCBB'
. The shortest of these is 'ABBBC'
. If there are two or more shortest substrings any one can be returned.
Upvotes: 0
Views: 2235
Reputation: 110675
Code
def doit(str)
uniq_chars = str.each_char.uniq
nbr_uniq_chars = uniq_chars.size
last_idx = str.size - 1
shortest = { idx: 0, size: str.size }
last_start_idx = last_idx - nbr_uniq_chars + 1
(0..last_start_idx).each do |start_idx|
first_end_idx = start_idx + nbr_uniq_chars - 1
last_end_idx = start_idx + shortest[:size] - 1
(first_end_idx..last_end_idx).each do |end_idx|
if (uniq_chars - str[start_idx..end_idx].chars).empty?
shortest = { idx: start_idx,
size: end_idx - start_idx + 1 }
break
end
end
end
str[shortest[:idx], shortest[:size]]
end
Examples
doit "AABBBCBB" #=> "ABBBC"
doit "AABBBCBBAC" #=> "BAC"
doit "aabcaadcc" #=> "bcaad"
Explanation
Suppose:
str = "AABBBCBB"
The steps are as follows.
uniq_chars = str.each_char.uniq
#=> ["A", "B", "C"]
nbr_uniq_chars = uniq_chars.size
#=> 3
last_idx = str.size - 1
#=> 7
shortest = { idx: 0, size: str.size }
#=> {:idx=>0, :size=>8}
shortest
describes the shortest subtring found so far. It is the substring
str[shortest[:idx], shortest[:size]]
Initially it describes the entire string. Continuing,
last_start_idx = last_idx - nbr_uniq_chars + 1
#=> 5
I will be fixing the starting index, start_idx
, initially at zero, and then consider all substrings that begin with that index. There is no reason to consider start_idx > last_idx
as in that case str[start_idx..-1].size < nbr_uniq_chars
, and therefore is not a possibility.
enum1 = (0..last_start_idx).each
#=> #<Enumerator: 0..5:each>
start_idx = enum1.next
#=> 0
first_end_idx = start_idx + nbr_uniq_chars - 1
#=> 3
last_end_idx = start_idx + shortest[:size] - 1
#=> 7
enum2 = (first_end_idx..last_end_idx).each
#=> #<Enumerator: 3..4:each>
end_idx = enum2.next
#=> 2
a = str[start_idx..end_idx].chars
#=> str[0..2].chars
#=> ["A", "A", "B"]
b = uniq_chars - a
#=> ["A", "B", "C"] - ["A", "A", "B"]
#=> ["C"]
b.empty?
#=> false, so do not update `shortest`
end_idx = enum2.next
#=> 3
a = str[start_idx..end_idx].chars
#=> ["A", "A", "B", "B"]
b = uniq_chars - a
#=> ["C"]
b.empty?
#=> false, so do not update `shortest`
end_idx = enum2.next
#=> 4
a = str[start_idx..end_idx].chars
#=> ["A", "A", "B", "B", "B"]
b = uniq_chars - a
#=> ["C"]
b.empty?
#=> false, so do not update `shortest`
end_idx = enum2.next
#=> 5
a = str[start_idx..end_idx].chars
#=> ["A", "A", "B", "B", "B", "C"]
b = uniq_chars - a
#=> []
b.empty?
#=> true
We now have found a substring that contains all unique characters in the string, but more than that we know that it is shorter than the previous shortest substring. (No need to test!) Therefore, we update shortest
:
shortest = { idx: start_idx, size: end_idx - start_idx + 1 }
#=> {:idx=>0, :size=>6}
which describes the substring:
str[shortest[:idx], shortest[:size]]
#=> "AABBBC"
We no longer need to execute end_idx = enum2.next
for this value of start_idx
because we know that the associated substring would begin with the string just identified, and therefore would have all unique characters in str
but would be longer than the substring just found. We therefore execute:
break
terminating the inner loop. The next step is to generate the second element of enum1
and go from there:
start_idx = enum1.next
#=> 1
first_end_idx = start_idx + nbr_uniq_chars - 1
#=> 3
last_end_idx = start_idx + shortest[:size] - 1
#=> 6
This will result in shortest
being updated (for the last time) to:
shortest
#=> { idx: 1, size: 5 }
The remaining calculations are similar.
Upvotes: 1
Reputation: 1254
I have solved this algoritm with below approach.
def max_distinct_char(str)
str.chars.uniq.count
end
def smallest_subset_str(str)
str_length = str.length
max_distinct = max_distinct_char(str)
min_str_len = str_length
for j in (0..str_length)
for k in (0..str_length)
sub_str = str[j..k]
sub_str_length = sub_str.length
sub_distinct_char_length = max_distinct_char(sub_str)
if (sub_str_length < min_str_len && max_distinct == sub_distinct_char_length)
min_str_len = sub_str_length
sub_string = sub_str
end
end
end
sub_string
end
Using the above methods, we can get results as
smallest_subset_str "AABBBCBB" #=> "ABBBC"
smallest_subset_str "AABBBCBBAC" #=> "BAC"
smallest_subset_str "aabcaadcc" #=> "bcaad"
Upvotes: 0