Hashmi
Hashmi

Reputation: 147

Comparing two Lists with Strings and Sublists in them

I am stuck in a situation where I have to compare a list of lists where each sublist contains two strings and a sub-sublist. I want to compare each sublist to next sublist and record their first string and the matching identifiers in the third item (sub-sublist). It looks a bit confusing. Here is the example: I have the following list of lists:

node = [['1001', '2008-01-06T02:12:13Z', ['']], 
        ['1002', '2008-01-06T02:13:55Z', ['']],  
        ['1003', '2008-01-06T02:13:00Z', ['Lion', 'Rhinoceros', 'Leopard', 'Panda']], 
        ['1004', '2008-01-06T02:15:20Z', ['Lion', 'Leopard', 'Eagle', 'Panda', 'Tiger']], 
        ['1005', '2008-01-06T02:15:48Z', ['Lion', 'Panda', 'Cheetah', 'Goat', 'Tiger']], 
        ['1006', '2008-01-06T02:13:30Z', ['']], 
        ['1007', '2008-01-06T02:13:38Z', ['Cheetah', 'Tiger', 'Goat']]]

The first item of each sublist is an ID, second item is timestamp, and third item (sub-sublist) contains the members. I want to compare the members and if two sublists contain same members, I want to store them in a new list as follows along with their IDs.

output-list = [['1003', '1004', ['Lion', 'Leopard', 'Panda']], 
               ['1003', '1005', ['Lion', 'Panda']], 
               ['1004', '1005', ['Lion', 'Panda', 'Tiger']], 
               ['1004', '1007', ['Tiger']], 
               ['1005', '1007', ['Cheetah', 'Goat', 'Tiger']]]

My head is not getting around it how to make a double for loop or any other way of doing it. Can anyone help me here please? Sorry I cannot produce a trying code.

Upvotes: 0

Views: 384

Answers (5)

Chiheb Nexus
Chiheb Nexus

Reputation: 9257

I think the best approach to solve your question is by using combinations from itertools with an intersection between your lists converted into a dicts module like this example:

from itertools import combinations

def compare(node, grouping=2):
    for elm1, elm2 in combinations(node, grouping):
        condition = set(elm1[-1]) & set(elm2[-1])
        if bool(condition) and condition != {''}:
            yield elm1[0], elm2[0], list(condition)

node = [['1001', '2008-01-06T02:12:13Z', ['']],
        ['1002', '2008-01-06T02:13:55Z', ['']],
        ['1003', '2008-01-06T02:13:00Z', ['Lion', 'Rhinoceros', 'Leopard', 'Panda']],
        ['1004', '2008-01-06T02:15:20Z', ['Lion', 'Leopard', 'Eagle', 'Panda', 'Tiger']],
        ['1005', '2008-01-06T02:15:48Z', ['Lion', 'Panda', 'Cheetah', 'Goat', 'Tiger']],
        ['1006', '2008-01-06T02:13:30Z', ['']],
        ['1007', '2008-01-06T02:13:38Z', ['Cheetah', 'Tiger', 'Goat']]]

final = list(compare(node))
print(final)

Output:

[['1003', '1004', ['Lion', 'Leopard', 'Panda']],
 ['1003', '1005', ['Lion', 'Panda']],
 ['1004', '1005', ['Lion', 'Tiger', 'Panda']],
 ['1004', '1007', ['Tiger']],
 ['1005', '1007', ['Goat', 'Tiger', 'Cheetah']]]

Upvotes: 1

Shu ba
Shu ba

Reputation: 334

Looks like what you are looking for is itertools.combinations which comes with python

di={i[0]:set(i[2]) for i in node};outputlist=[]
for i,j in itertools.combinations(di.keys(),2):
    union=list(di[i].intersection(di[j]))
    if union and not union[0]=='':#makes sure it is not an empty set and that it does not contain only empty lists
        outputlist.append([i,j,union])

You can even skip the di stage and jump to the combinations

outputlist=[]
for i,j in itertools.combinations(node,2):
    union=list(set(i[2]).intersection(set(j[2])))
    if union and not union[0]=='':#makes sure it is not an empty set and that it does not contain only empty lists
        outputlist.append([i[0], j[0],union])

Also I Would recomend leaving the animals as a set and assigning an empty list as python empty list.

EDIT

If you stick to lists you might better use

union=filter(lambda x:x in i[2],j[2])

since type changing is a bit unefficient.

All comes down to

import itertools    
output_list=filter(lambda x:x[2] and not x[2][0]=='',[[i[0],j[0],filter(lambda x:x in i[2],j[2])]for i,j in itertools.combinations(node,2)])

Upvotes: 1

dmitryro
dmitryro

Reputation: 3506

One more way to do it:

node = [['1001', '2008-01-06T02:12:13Z', ['']],
        ['1002', '2008-01-06T02:13:55Z', ['']],
        ['1003', '2008-01-06T02:13:00Z', ['Lion', 'Rhinoceros', 'Leopard', 'Panda']],
        ['1004', '2008-01-06T02:15:20Z', ['Lion', 'Leopard', 'Eagle', 'Panda', 'Tiger']],
        ['1005', '2008-01-06T02:15:48Z', ['Lion', 'Panda', 'Cheetah', 'Goat', 'Tiger']],
        ['1006', '2008-01-06T02:13:30Z', ['']],
        ['1007', '2008-01-06T02:13:38Z', ['Cheetah', 'Tiger', 'Goat']]]
# Use this list for result
result = []

def city_exists(city, cities):
    """ Just a helper to verify if city already used """
    for c in cities:
        if c[1] == city:
            return True
    return False

# And finally, iterate and add to the resulting list
for item in node:
    for city in item[2]:
        if not city_exists(city, result):
            result.append([item[0], city])

# Print out the result
print(result)

Upvotes: 1

SigmaPiEpsilon
SigmaPiEpsilon

Reputation: 698

Here is the simplest way if the order in the matching list is important.

>>> out  = []
>>> for ii, elem in enumerate(node[:-1]):                                                                                                            
...     for jj in range(ii + 1, len(node)):                                                                                                          
...         common = [subelem for subelem in elem[-1] if subelem in node[jj][-1]]
...         if len(common) > 0 and common != ['']:
...             out.append([elem[0], node[jj][0], common])                                                                                       
... 
>>> for elem in out:
...     print elem
... 
['1003', '1004', ['Lion', 'Leopard', 'Panda']]
['1003', '1005', ['Lion', 'Panda']]
['1004', '1005', ['Lion', 'Panda', 'Tiger']]
['1004', '1007', ['Tiger']]
['1005', '1007', ['Cheetah', 'Goat', 'Tiger']]

If order is not important and the lists are big use set intersection for the first line in the double loop as below

common = list(set(elem[-1]).intersection(set(node[jj][-1])))

Upvotes: 1

Guy S.
Guy S.

Reputation: 45

You can calculate a md5 hash for every list and compare them, just like a checksum.

node_md5hash = hashlib.md5(bencode.bencode(node)).hexdigest() output-list_md5hash = hashlib.md5(bencode.bencode(output-list)).hexdigest() And it would give you an md5 hash for node and for output-list and if the hashes are the same, so are their values.

You will need to import the hashlib library and the bencode library (you will probably have to pip install bencode).

Upvotes: 1

Related Questions