Performance of checking "expanded list" equality

Question

I need to check whether a given list is equal to the result of substituting some lists for some elements in another list. Concretely, I have a dictionary, say f = {'o': ['a', 'b'], 'l': ['z'], 'x': ['y']} and a list list1 = ['H', 'e', 'l', 'l', 'o'], so I want to check if some list2 is equal to ['H', 'e', 'z', 'z', 'a', 'b'].

Below, I first write a function apply to compute the image of list1 under f. Then, it suffices to write list2 == apply(list1, f). Since this function will be called thousands of times in my program, I need to make it very fast. Therefore, I thought of the second function below, which should be faster but turns out not to be. So my questions (detailed below) are: why? And: is there a faster method?

First function:

def apply(l, f):
    result = []
    for x in l:
        if x in f:
            result.extend(f[x])
        else:
            result.append(x)
    return result

Second function:

def apply_equal(list1, f, list2):
    i = 0
    for x in list1:
        if x in f:
            sublist = f[x]
            length = len(sublist)
            if list2[i:i + length] != substmt:
                return False
            i += length
        else:
            if list2[i] != x:
                return False
            i += 1
    return i == len(list2)

I thought the second method would be faster since it does not construct the list which is the image of the first list by the function and then checks equality with the second list. On the contrary, it checks equality "on the fly" without constructing a new list. So I was surprised to see that it is not faster (and even: a bit slower). For the record: list1, list2, and the lists which are values in the dictionary are all small (typically under 50 elements), as well as the number of keys of the dictionary.

So my questions are: why isn't the second method faster ? And: are there ways to do this faster (possibly using other data structures)?

Edit in response to the comments: list1 and list2 will most often be different, but f may be common to some of them. Typically, there could be around 100,000 checks in batches of around 50 consecutive checks with a common f. The elements of the lists are short strings. It is expected that all checks return True (so the whole lists have to be iterated over).

Kelly Bundy · Accepted Answer

Without proper data for benchmarking it's hard to say, so I tested various solutions with various "sizes" of data.

Replacing result.extend(f[x]) with result += f[x] always made it faster.

This was faster for longer lists (using itertools.chain):

list2 == list(chain.from_iterable(map(f.get, list1, zip(list1))))

If the data allows it, explicitly storing all possible keys and always accessing with f[x] would speed it up. That is, set f[k] = k, for all "missing" keys in advance, so you don't have to check with in or use get.

Performance of checking "expanded list" equality

Answers (2)

Related Questions

Performance of checking &quot;expanded list&quot; equality

Answers (2)

Related Questions

Performance of checking "expanded list" equality