Reputation: 147
I am stuck in a situation where I have to compare a list of lists where each sublist contains two strings and a sub-sublist. I want to compare each sublist to next sublist and record their first string and the matching identifiers in the third item (sub-sublist). It looks a bit confusing. Here is the example: I have the following list of lists:
node = [['1001', '2008-01-06T02:12:13Z', ['']],
['1002', '2008-01-06T02:13:55Z', ['']],
['1003', '2008-01-06T02:13:00Z', ['Lion', 'Rhinoceros', 'Leopard', 'Panda']],
['1004', '2008-01-06T02:15:20Z', ['Lion', 'Leopard', 'Eagle', 'Panda', 'Tiger']],
['1005', '2008-01-06T02:15:48Z', ['Lion', 'Panda', 'Cheetah', 'Goat', 'Tiger']],
['1006', '2008-01-06T02:13:30Z', ['']],
['1007', '2008-01-06T02:13:38Z', ['Cheetah', 'Tiger', 'Goat']]]
The first item of each sublist is an ID, second item is timestamp, and third item (sub-sublist) contains the members. I want to compare the members and if two sublists contain same members, I want to store them in a new list as follows along with their IDs.
output-list = [['1003', '1004', ['Lion', 'Leopard', 'Panda']],
['1003', '1005', ['Lion', 'Panda']],
['1004', '1005', ['Lion', 'Panda', 'Tiger']],
['1004', '1007', ['Tiger']],
['1005', '1007', ['Cheetah', 'Goat', 'Tiger']]]
My head is not getting around it how to make a double for loop or any other way of doing it. Can anyone help me here please? Sorry I cannot produce a trying code.
Upvotes: 0
Views: 384
Reputation: 9257
I think the best approach to solve your question is by using combinations
from itertools
with an intersection between your lists converted into a dicts module like this example:
from itertools import combinations
def compare(node, grouping=2):
for elm1, elm2 in combinations(node, grouping):
condition = set(elm1[-1]) & set(elm2[-1])
if bool(condition) and condition != {''}:
yield elm1[0], elm2[0], list(condition)
node = [['1001', '2008-01-06T02:12:13Z', ['']],
['1002', '2008-01-06T02:13:55Z', ['']],
['1003', '2008-01-06T02:13:00Z', ['Lion', 'Rhinoceros', 'Leopard', 'Panda']],
['1004', '2008-01-06T02:15:20Z', ['Lion', 'Leopard', 'Eagle', 'Panda', 'Tiger']],
['1005', '2008-01-06T02:15:48Z', ['Lion', 'Panda', 'Cheetah', 'Goat', 'Tiger']],
['1006', '2008-01-06T02:13:30Z', ['']],
['1007', '2008-01-06T02:13:38Z', ['Cheetah', 'Tiger', 'Goat']]]
final = list(compare(node))
print(final)
Output:
[['1003', '1004', ['Lion', 'Leopard', 'Panda']],
['1003', '1005', ['Lion', 'Panda']],
['1004', '1005', ['Lion', 'Tiger', 'Panda']],
['1004', '1007', ['Tiger']],
['1005', '1007', ['Goat', 'Tiger', 'Cheetah']]]
Upvotes: 1
Reputation: 334
Looks like what you are looking for is itertools.combinations which comes with python
di={i[0]:set(i[2]) for i in node};outputlist=[]
for i,j in itertools.combinations(di.keys(),2):
union=list(di[i].intersection(di[j]))
if union and not union[0]=='':#makes sure it is not an empty set and that it does not contain only empty lists
outputlist.append([i,j,union])
You can even skip the di stage and jump to the combinations
outputlist=[]
for i,j in itertools.combinations(node,2):
union=list(set(i[2]).intersection(set(j[2])))
if union and not union[0]=='':#makes sure it is not an empty set and that it does not contain only empty lists
outputlist.append([i[0], j[0],union])
Also I Would recomend leaving the animals as a set and assigning an empty list as python empty list.
If you stick to lists you might better use
union=filter(lambda x:x in i[2],j[2])
since type changing is a bit unefficient.
import itertools
output_list=filter(lambda x:x[2] and not x[2][0]=='',[[i[0],j[0],filter(lambda x:x in i[2],j[2])]for i,j in itertools.combinations(node,2)])
Upvotes: 1
Reputation: 3506
One more way to do it:
node = [['1001', '2008-01-06T02:12:13Z', ['']],
['1002', '2008-01-06T02:13:55Z', ['']],
['1003', '2008-01-06T02:13:00Z', ['Lion', 'Rhinoceros', 'Leopard', 'Panda']],
['1004', '2008-01-06T02:15:20Z', ['Lion', 'Leopard', 'Eagle', 'Panda', 'Tiger']],
['1005', '2008-01-06T02:15:48Z', ['Lion', 'Panda', 'Cheetah', 'Goat', 'Tiger']],
['1006', '2008-01-06T02:13:30Z', ['']],
['1007', '2008-01-06T02:13:38Z', ['Cheetah', 'Tiger', 'Goat']]]
# Use this list for result
result = []
def city_exists(city, cities):
""" Just a helper to verify if city already used """
for c in cities:
if c[1] == city:
return True
return False
# And finally, iterate and add to the resulting list
for item in node:
for city in item[2]:
if not city_exists(city, result):
result.append([item[0], city])
# Print out the result
print(result)
Upvotes: 1
Reputation: 698
Here is the simplest way if the order in the matching list is important.
>>> out = []
>>> for ii, elem in enumerate(node[:-1]):
... for jj in range(ii + 1, len(node)):
... common = [subelem for subelem in elem[-1] if subelem in node[jj][-1]]
... if len(common) > 0 and common != ['']:
... out.append([elem[0], node[jj][0], common])
...
>>> for elem in out:
... print elem
...
['1003', '1004', ['Lion', 'Leopard', 'Panda']]
['1003', '1005', ['Lion', 'Panda']]
['1004', '1005', ['Lion', 'Panda', 'Tiger']]
['1004', '1007', ['Tiger']]
['1005', '1007', ['Cheetah', 'Goat', 'Tiger']]
If order is not important and the lists are big use set intersection
for the first line in the double loop as below
common = list(set(elem[-1]).intersection(set(node[jj][-1])))
Upvotes: 1
Reputation: 45
You can calculate a md5 hash for every list and compare them, just like a checksum.
node_md5hash = hashlib.md5(bencode.bencode(node)).hexdigest()
output-list_md5hash = hashlib.md5(bencode.bencode(output-list)).hexdigest()
And it would give you an md5 hash for node and for output-list and if the hashes are the same, so are their values.
You will need to import the hashlib library and the bencode library (you will probably have to pip install bencode).
Upvotes: 1