formicaman
formicaman

Reputation: 1357

Efficiently combine two uneven lists into dictionary based on condition

I have two lists of tuples and I want to map elements in e to elements in s based on a condition. The condition is that the 1st element of something in e needs to be >= to the 1st element in s and elements 1+2 in e need to be <= to elements 1+2 in s. The 1st number in each s tuple is a start position and the second is the length. I can do it as follows:

e = [('e1',3,3),('e2',6,2),('e3',330,3)]
s = [('s1',0,10),('s2',11,24),('s3',35,35),('s4',320,29)]

for i in e:
    d = {i:j for j in s if i[1]>=j[1] and i[1]+i[2]<=j[1]+j[2]}
    print(d)

Output (what I want):

{('e1', 3, 3): ('s1', 0, 10)}
{('e2', 6, 2): ('s1', 0, 10)}
{('e3', 330, 3): ('s4', 320, 29)}

Is there a more efficient way to get to this result, ideally without the for loop (at least not a double loop)? I've have tried some things with zip as well as something along the lines of

list(map(lambda i,j: {i:j} if i[1]>=j[1] and i[1]+i[2]<=j[1]+j[2] else None, e, s))

but it is not giving me quite what I am looking for.

The elements in s will never overlap. For example, you wouldn't have ('s1',0,10) and ('s2', 5,15). In other words, the range (0, 0+10) will never overlap with (5,5+15) or that of any other tuple. Additionally, all the tuples in e will be unique.

Upvotes: 1

Views: 246

Answers (1)

Hans Musgrave
Hans Musgrave

Reputation: 7141

The constraint that the tuples in s can't overlap is pretty strong. In particular, it implies that each value in e can only match with at most one value in s (I think the easiest way to show that is to assume two distinct, non-overlapping tuples in s match with a single element in e and derive a contradiction).

Because of that property, any s-tuple s1 matching an e-tuple e1 has the property that among all tuples sx in s with sx[1]<=e1[1], it is the one with the greatest sum(sx[1:]), since if it weren't then the s-tuple with a small enough 1st coordinate and greater sum would also be a match (which we already know is impossible).

That observation lends itself to a fairly simple algorithm where we linearly walk through both e and s (sorted), keeping track of the s-tuple with the biggest sum. If that sum is big enough compared to the e-tuple we're looking at, add the pair to our result set.

def pairs(e, s):
  # Might be able to skip or replace with .sort() depending
  # on the rest of your program
  e = sorted(e, key=lambda t: t[1])
  s = sorted(s, key=lambda t: t[1])

  i, max_seen, max_sum, results = 0, None, None, {}
  for ex in e:
    while i < len(s) and (sx:=s[i])[1] <= ex[1]:
      if not max_seen or sum(sx[1:]) > max_sum:
        max_seen, max_sum = sx, sum(sx[1:])
      i += 1
    
    if max_seen and max_sum > sum(ex[1:]):
      results[ex] = s[i]

  return results

Upvotes: 1

Related Questions