Reputation: 45
I am iterating over a collection of 10-20 long lists containing ~500.000 sublists. The lists look like:
A = [['a', 'b', 1.7], ['d', 'e', 6.2] ...]
B = [['a', 'b', 2.0], ['d', 'e', 10.0] ...]
C = [['a', 'b', 3.0], ['d', 'e',7.0] ...]
and so on... My objective is to obtain a list at the end as the following one:
final = [['a', 'b', 1.7, 2.0, 3.0], ['d', 'e', 6.2, 6.2, 10.0, 7.0] ...]
I have already used nested loops by comparing a template list (e.g. A) with a list containing all the lists values(total):
total =[['a', 'b', 1.7], ['d', 'e', 6.2], ['a', 'b', 2.0], ['d', 'e', 10.0], ['a', 'b', 3.0], ['d', 'e',7.0]]
temp = []
for i in A:
new = [i[0:1]]
for j in total:
if i[0] == j[0]:
new.append(j[2])
temp.append(new)
I get something close to what I am looking for, except that initial strings are included within a sublist. But that would be easy to work around later. The problem with this approach is that considering the size of the lists, the complete processes takes a huge amount of time. Any alternative suggestion or tip to shorten this procedure would be appreciated.
Upvotes: 1
Views: 83
Reputation: 250941
A dict would be more appropriate here as it'll allow you to access values related to any key in O(1)
time.
Using collections.defaultdict
:
>>> from collections import defaultdict
>>> total =[['a', 'b', 1.7], ['d', 'e', 6.2], ['a', 'b', 2.0], ['d', 'e', 10.0], ['a', 'b', 3.0], ['d', 'e',7.0]]
>>> dic = defaultdict(list)
>>> for item in total:
key = tuple(item[:2]) #tuples can be used as dictionary keys
val = item[2]
dic[key].append(val)
...
>>> dic
defaultdict(<type 'list'>,
{('a', 'b'): [1.7, 2.0, 3.0],
('d', 'e'): [6.2, 10.0, 7.0]})
Using normal dict
:
>>> dic = {}
>>> for item in total:
key = tuple(item[:2]) #tuples can be used as dictionary keys
val = item[2]
dic.setdefault(key, []).append(val)
...
>>> dic
{('a', 'b'): [1.7, 2.0, 3.0], ('d', 'e'): [6.2, 10.0, 7.0]}
Upvotes: 3