Reputation: 308
I'm trying to come up with an algorithm for finding the shortest distance between a list of words. I have a dictionary of lists that shows different locations in a document a word is found.
matches{"the" : [2, 24, 15], "is" : [5, 13], "apple" : {45} ... }
Are there established algorithms for finding the shortest length where all of these overlap? For example, in this one 13-45 would be the answer because all of the words can be found in that range.
Upvotes: 0
Views: 67
Reputation: 1583
I'd keep two positions, left
and right
which are, respectively, the left and right ends of a range that contains all the words. I'd also maintain a priority queue, each entry of which is a word and the list of places the word occurs at or after the current left edge.
To initialize, make a new empty priority queue, insert each word with it's full list of occurrences, properly sorted. As you insert each word, update right
so it's the maximum first occurrence of any word. For your data, the initial setup would be
left=2,right=45,queue=[["the", [2,15,24]], ["is", [5, 13], ["apple", [45]]
I'm showing the priority queue as if it were an array, sorted by the first component of its second component. That is, in the order 2 (for "the"), 5 (for "is"), and 45 (for "apple"). Note that the occurrences for "the" had to be sorted during this initialization. right
turns out to be 45, the maximum of 2, 5, and 45.
left
is implicit. It's always the first occurrence of whatever's at the front of the priority queue. At this point the shortest range we've found is 2..45.
Then repeat the following loop:
remove the first entry from the priority queue
shift its next occurrence into `left`
check if left..right is a new shortest sequence
if we've shifted off the last occurrence for this entry
stop
otherwise,
update `right` to include this new next occurrence
insert the entry back into the priority queue
With your data, the successive values would be:
left=2,right=45,queue=[["the", [2,15,24]], ["is", [5, 13], ["apple", [45]]
left=5,right=45,queue=[["is", [5, 13], ["the", [15,24]], ["apple", [45]]
left=13,right=45,queue=[["is", [13], ["the", [15,24]], ["apple", [45]]
and we terminate, because after popping ["is", [13]]
from the queue and shifting 13 off its list of occurrences, none remain.
Upvotes: 1