Reputation: 235
I would like to select the parts of a string covered by a set of substrings with the following properties:
For example:
string = "MGLSDGEWQQVLNVWGKVEADIAGHGQEVLIHSKHPGDFGADAQGAMTKALELFRNDIAAKYKELGFQG"
substring1 = "HPGDFGADAQGAMTKALELFR"
substring2 = "GEWQQVLNVWGK"
substringn = "ALELFRNDIAAKYK"
And I would like to get:
coverage = "MGLSD<b>GEWQQVLNVWGK</b>VEADIAGHGQEVLIHSK<b>HPGDFGADAQGAMTKALELFRNDIAAKYK</b>ELGFQG"
I tried to extract the positions of the substrings within the string like this:
substrings_array.each do |substring|
start_pos = string.index substring
end_pos = string.length - (string.reverse.index(substring.reverse) )
end
and that, way I get a start and an end position for each substring. How could I merge them all, especially considering they may overlap and appear in different orders? Is this even a good strategy?
Upvotes: 1
Views: 89
Reputation: 19338
This should work (not pretty, but it works):
string = "MGLSDGEWQQVLNVWGKVEADIAGHGQEVLIHSKHPGDFGADAQGAMTKALELFRNDIAAKYKELGFQG"
substring1 = "HPGDFGADAQGAMTKALELFR"
substring2 = "GEWQQVLNVWGK"
substring3 = "ALELFRNDIAAKYK"
substrings = [substring1, substring2, substring3]
overlapping_indexes = substrings.map do |substring|
start_pos = string.index substring
end_pos = start_pos + substring.length
(start_pos..end_pos)
end
# the following 3 methods are from Wayne Conrad in this question: http://stackoverflow.com/questions/6017523/how-to-combine-overlapping-time-ranges-time-ranges-union
def ranges_overlap?(a, b)
a.include?(b.begin) || b.include?(a.begin)
end
def merge_ranges(a, b)
[a.begin, b.begin].min..[a.end, b.end].max
end
def merge_overlapping_ranges(ranges)
ranges.sort_by(&:begin).inject([]) do |ranges, range|
if !ranges.empty? && ranges_overlap?(ranges.last, range)
ranges[0...-1] + [merge_ranges(ranges.last, range)]
else
ranges + [range]
end
end
end
indexes = merge_overlapping_ranges(overlapping_indexes)
x = "<b>"
y = "</b>"
offset = 0
indexes.each do |index|
string.insert(index.begin + offset, x)
offset += x.length
string.insert(index.end + offset, y)
offset += y.length
end
p string
Upvotes: 1