naiveai
naiveai

Reputation: 853

Efficiently finding the longest matching prefix string

My current implementation is this:

def find_longest_matching_option(option, options):
    options = sorted(options, key=len)
    longest_matching_option = None
    for valid_option in options:
        # Don't want to treat "oreo" as matching "o",
        # match only if it's "o reo"
        if re.match(ur"^{}\s+".format(valid_option), option.strip()):
            longest_matching_option = valid_option
    return longest_matching_option

Some examples of what I'm trying to do:

"foo bar baz something", ["foo", "foo bar", "foo bar baz"]
# -> "foo bar baz"
"foo bar bazsomething", (same as above)
# -> "foo bar"
"hello world", ["hello", "something_else"]
# -> "hello"
"a b", ["a", "a b"]
# -> "a b" # Doesn't work in current impl.

Mostly, I'm looking for efficiency here. The current implementation works, but I've been told it's O(m^2 * n), which is pretty bad.

Thanks in advance!

Upvotes: 1

Views: 1565

Answers (2)

cs95
cs95

Reputation: 403278

Let's start with foo.

def foo(x, y):
    x, y = x.strip(), y.strip()
    return x == y or x.startswith(y + " ")

foo returns true either if two strings are equal, or one (plus a space) is a substring of the other.

Next, given a case string, and a list of options, you can use filter to find all valid substrings for the given case string, and then apply max to find the longest one (see tests below).

Here's a few test cases for foo. For the purpose of demonstrating, I'll use partial to curry foo to a higher order function.

from functools import partial

cases = ["foo bar baz something", "foo bar bazsomething", "hello world", "a b", "a b"]
options = [
      ["foo", "foo bar", "foo bar baz"], 
      ["foo", "foo bar", "foo bar baz"],
      ["hello", "something_else"],
      ["a", "a b"],
      ["a", "a b\t"]
]
p_list = [partial(foo, c) for c in cases]

for p, o in zip(p_list, options):
    print(max(filter(p, o), key=len))

foo bar baz
foo bar
hello
a b
a b

Upvotes: 2

Aran-Fey
Aran-Fey

Reputation: 43356

Regex is overkill here; you can simply append a space to each string before comparing them to get the same result.

You also don't need to sort the data. It's more efficient to simply loop over every value.

def find_longest_matching_option(option, options):
    # append a space so that find_longest_matching_option("a b", ["a b"])
    # works as expected
    option += ' '
    longest = None

    for valid_option in options:
        # append a space to each option so that only complete
        # words are matched
        valid_option += ' '
        if option.startswith(valid_option):
            # remember the longest match
            if longest is None or len(longest) < len(valid_option):
                longest = valid_option

    if longest is not None:
        # remove the trailing space
        longest = longest[:-1]
    return longest

Upvotes: 1

Related Questions