find the minimum number of words(distance) between repeated occurrences of a search string in the input string

Question

Here are test cases for the code:

- string - 'Tim had been saying that he had been there'
- search - 'had'
- expected output - 4
- string - 'he got what he got and what he wanted'
- search - 'he'
- expected out - 2

def return_distance(input, search):
    words = input.split()
    distance = None
    
    indx = []
    if not input or not search:
        return None
    else:
        if words.count(search) >1:
            indx = [ index for index, word in enumerate(words) if word == search]
            distance = indx[1] - indx[0]
            for i  in range(len(indx)-1):
                distance = min(distance, indx[i+1] - indx[i])-1
    
    return distance

I am thinking how to optimize the code. I admit it is poorly written.

ggorlen · Accepted Answer

How about

def min_distance_between_words(sentence, word):
    idxes = [i for i, e in enumerate(sentence.split()) if e == word]
    return min([y - x - 1 for x, y in zip(idxes, idxes[1:])])

This splits the input sentence, makes a list of every index that matches the target word, then iterates over this list to compute the differences between each index and returns the minimum difference.

Since behavior is unspecified when the sentence doesn't have a word, it raises an error but you can add a check for this and return the value of your choice if desired using min's default parameter:

def min_distance_between_words(sentence, word):
    idxes = [i for i, e in enumerate(sentence.split()) if e == word]
    return min([y - x - 1 for x, y in zip(idxes, idxes[1:])], default=None)

As an aside, naming a variable input overwrites a builtin and return_distance is a rather ambiguous name for a function.

Adding a precondition for parameters for None as done with if not input or not search: is not typically done in Python (we assume caller will always pass in a string and adhere to the function's contract).

If you want to generalize this further, move the split() duty to the domain of the caller which enables the function to operate on arbitrary iterables:

def min_distance_between_occurrences(it, target):
    idxes = [i for i, e in enumerate(it) if e == target]
    return min([y - x - 1 for x, y in zip(idxes, idxes[1:])], default=None)

Call with:

min_distance_between_occurrences("a b c a".split(), "a")
min_distance_between_occurrences([(1, 2), (1, 3), (1, 2)], (1, 2))

Refactoring aside, as pointed out in the comments, the original code isn't correct. Issues include:

search_str does not exist. You probably meant search.
distance and min_dist don't really work together. Pick one or the other and use it for all minimum calculations.
min(min_dist, indx[i+1] - indx[i])-1 subtracts 1 in the wrong place, throwing off the count.

Here's a potential fix for these issues:

def return_distance(input, search):
    words = input.split()
    distance = None

    if words.count(search) > 1:
        indx = [index for index, word in enumerate(words) if word == search]
        distance = indx[1] - indx[0] - 1
        #                           ^^^^

        for i  in range(len(indx) - 1):
            distance = min(distance, indx[i+1] - indx[i] - 1)
            #                                           ^^^^
            
    return distance

find the minimum number of words(distance) between repeated occurrences of a search string in the input string

Answers (2)

Related Questions