Reputation: 246

Zip lists by function output in Python

It is known that if we have two lists in python l1 and l2, one can use zip(l1, l2) in order to create a list of tuples from both l1 and l2.

Now, lets assume that we have two functions f1 and f2. I would like to create a list of tuples with elements from l1 and l2 (like the zip output) such that the output of f1 on the element from l1 will be equal to the output of f2 on the element from l2. In mathematic form:

In my use case, one can assume that f1 and f2 are both injective functions. But, it is not guaranteed that every element of l1 has a match in l2 and vise versa.

Here's a code that does that:

from itetools import product

tuples_list = [
    (e1, e2)
    for e1, e2 in product(l1, l2)
    if f1(e1) == f2(e2)
]

I wonder if there's a better way of doing it (maybe it's already implemented in a built-in library in a faster/cleaner way).

Any help would be much appreciated.

Upvotes: 1

Answers (3)

waykiki

Reputation: 1094

I like the solutions posted by Rui Reis and SMeznaric and they inspired me to write my own.

My solution is less intuitive, but I think it works faster.

Code:

def find_pairs(l1, l2, f1, f2):
    d = {}
    for e1 in l1:
        d[f1(e1)] = e1

    tuples_list = []
    for e2 in l2:
        if f2(e2) in d:
            tuples_list.append((d[f2(e2)], e2))

    return tuples_list

Explanation:

Create an empty dictionary.
Fill the dictionary with [f1, e1] pairs. That is, for every element e1 in list l1, compute the value of the function f1. That value (f1) will be the key for the dictionary, and the element itself (e1) will be the value. Note that this only requires one pass through the first list.
(Crucial part) Now, we pass through list l2. For every element e2 in list l2, we compute the value of the function f2. Now we check whether this computed function value already exists in our dictionary. This operation is O(1) fast! (https://wiki.python.org/moin/TimeComplexity). If the value already exists in the dictionary, that means that we have a case that f1(e1) == f2(e2). We only don't know what e1 is. But, since we cached it in the dictionary as the correspoding key-value pair, we get e1 by getting the value from the dictionary under key f1. And we know f1, since f1 == f2(e2). So: d[f2(e2)] == e1.

Notes:

This should be quite fast since f1 is applied only once on every element of l1, and f2 is applied only once on every element of f2.
In the actual code provided, f2(e2) is computed twice. I did this for readability, but we can easily memorise f2(e2) into a variable.
The speed-up relies on storing the function values as keys in a dictionary, so that time isn't spent searching for matches.
If f1 and f2 are real functions (return floats), they can still be used as dictionary keys: Float values as dictionary key

Upvotes: 0

SMeznaric

Reputation: 492

If your functions are expensive to compute you might want to precompute f1 and f2 before doing the product. Instead of computing the functions NxM times you can compute them N+M times. You could do

l1_out = [(f1(el_l1), el_l1) for el_l1 in l1]
l2_out = [(f1(el_l2), el_l2) for el_l2 in l2]

tuples_list = [(x[1], y[1]) for x in l1_out for y in l2_out if x[0] == y[0]]

Upvotes: 2

Rui Reis

Reputation: 63

Please consider that itertools.product is in essence a wrapper for a nested loop. So a solution which doesn't require external modules would be:

tuples_list = [
    (e1, e2)
    for e1 in l1
    for e2 in l2
    if f1(e1) == f2(e2)
]

However, this solution can still be optimized. Since you resort to the cartesian product, this means you are re-calculating the f1 and f2 functions various times for each of the input values, which means you are adding needless complexity. You could cache the output values of the functions, my solution for a faster solution is:

lf1 = [f1(v) for v in l1]
lf2 = [f2(v) for v in l2]

tuples_list = [
    (l1[i1], l2[i2])
    for i1, fe1 in enumerate(lf1)
    for i2, fe2 in enumerate(lf2)
    if fe1 == fe2
]

Upvotes: 2

Zip lists by function output in Python

Answers (3)

Related Questions