jbogart
jbogart

Reputation: 141

Return only unique pairs of two indexes of lists within a list

I have a list of lists like [name1,name2,val1,val2]:

[['a','b',4,5],
['x','y',2,10],
['b','a',5,4],
['d','y',8,10],
['y','d',10,8],
['a','d',4,8]]

What I would like to do is filter this list of lists so that only unique combinations of name1 and name2 remain, regardless of their order:

[['a','b',4,5],
['x','y',2,10],
['d','y',8,10],
['a','d',4,8]]

using the comments, the best I could come up with is: the best I can come up with is

def removedupes(lst):
    newlist = []
    unique = set()
    for i in lst:
        if i[0] > i[1]:
            templist = [i[1],i[0],i[3],i[4],i[1]+i[0]]
        else:
            templist = [i[0],i[1],i[3],i[4],i[0]+i[1]]
        if templist[-1] not in unique:
            newlist.append(templist[:-1])
       unique.add(i[-1])                                    
   return newlist

just wondering if there's a more pythonic way to accomplish this?

Upvotes: 1

Views: 464

Answers (4)

joseville
joseville

Reputation: 953

EDIT: I misunderstood question and I don't think this does exactly what OP is looking for.


Here's how I would do it:

data = [['a','b',4,5],
['x','y',2,10],
['b','a',5,4],
['d','y',8,10],
['y','d',10,8],
['a','d',4,8]]

def foo(data):
    uniques = {}
    for n1, n2, v1, v2 in data:
        if n2 < n1:
            n1, n2 = n2, n1
        uniques[(n1, n2)] = (v1, v2)
    return [(n1, n2, v1, v2) for (n1, n2), (v1, v2) in uniques.items()]

print(foo(data))

Prints:

[('a', 'b', 5, 4), ('x', 'y', 2, 10), ('d', 'y', 10, 8), ('a', 'd', 4, 8)]

When there's a collision (i.e. two elements with the same (n1, n2), the code chooses the newer one.

Upvotes: 1

hpchavaz
hpchavaz

Reputation: 1388

Using a dict to keep track of the sorted pairs, which needs to convert to tuple to get an inmutable.

data = [
        ['a','b',4,5],
        ['x','y',2,10],
        ['b','a',5,4],
        ['d','y',8,10],
        ['y','d',10,8],
        ['a','d',4,8]
       ]

def foo(llist):
    dic = {tuple((k1, k2) if k2 > k1 else (k2,k1)):v for k1, k2, *v in llist}
    return [list(k)+list(v) for k,v in dic.items()]


>>>foo(data)

[['b', 'a', 5, 4], ['x', 'y', 2, 10], ['y', 'd', 10, 8], ['a', 'd', 4, 8]]

Notes: This answer:

  • breaks if the two pair elements cannot be compared ;
  • keeps the last sublist in case of collision, this is correct as the specifications do not provide a rule for such a case ;
  • accepts any number of 'values' (i.e. elements that are not in the two first places of the sub lists), even more that the length of sublists varies.

Upvotes: 1

Chris
Chris

Reputation: 36476

First off, let's write a list comprehension (that can readily be a generator expression) for the name components of each sublist, with them sorted.

[(a, b) if a < b else (b, a) for a, b, *_ in data]

Now, we can get a set of them to have the uniques.

>>> set((a, b) if a < b else (b, a) for a, b, *_ in data)
{('x', 'y'), ('a', 'd'), ('d', 'y'), ('a', 'b')}

Now we just need to be able to look up entries from data based on those tuples.

If we were looking for the combo ('a', 'd') (or ('d', 'a')):

>>> next(filter((lambda x: ('a', 'd') == (x[0], x[1]) if x[0] < x[1] else ('a', 'd') == (x[1], x[0])), data), None)
['a', 'd', 4, 8]

Now we just have to expand that to work on everything in our set.

>>> names_set = set((a, b) if a < b else (b, a) for a, b, *_ in data)
>>> [next(filter((lambda x: names == (x[0], x[1]) if x[0] < x[1] else names == (x[1], x[0])), data), None) for names in names_set]
[['x', 'y', 2, 10], ['a', 'd', 4, 8], ['d', 'y', 8, 10], ['a', 'b', 4, 5]]

We can also write this so that the conditional expression simply returns the correct tuple to compare to.

>>> [next(filter((lambda x: names == ((x[0], x[1]) if x[0] < x[1] else (x[1], x[0]))), data), None) for names in s]
[['x', 'y', 2, 10], ['a', 'd', 4, 8], ['d', 'y', 8, 10], ['a', 'b', 4, 5]]

Either way, the final result:

[['x', 'y', 2, 10], 
 ['a', 'd', 4,  8], 
 ['d', 'y', 8, 10], 
 ['a', 'b', 4,  5]]

Upvotes: 0

Woodford
Woodford

Reputation: 4439

If you don't care about the order of the data you can reformat your input to be more amenable to manipulation.

>>> {tuple(sorted(((x[0], x[2]), (x[1], x[3])))) for x in data}
{(('a', 4), ('b', 5)),
 (('a', 4), ('d', 8)),
 (('d', 8), ('y', 10)),
 (('x', 2), ('y', 10))}

Brief breakdown:

  1. Convert each row into the format ((name1, val1), (name2, val2))
  2. Sort each row to allow comparison
  3. Use a set to filter out duplicates

Upvotes: 1

Related Questions