Reputation: 149
I have a list of lists like this:
[[12411.0, 31937.0, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.15, 0.1, 0.15, 0.2, 0.1, 0.15, 0.15, 0.15, 0.15], [12411.0, 31937.0, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1]], etc.]
If the first and second element of an inner list is the same as the first and second element of another inner list (like the example above), I want to create a function that adds the remaining values and merges them into one list. The example output would be like this:
[12411.0, 31937, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.25, 0.2, 0.25, 0.3, 0.2, 0.25, 0.25, 0.25, 0.25]
I'm having trouble how to tell Python to initially recognize and compare the two elements of the list before merging them together. Here is my best attempt so far:
def group(A):
for i in range(len(A)):
for j in range(len(A[i])):
if A[i][0:1] == A[i: ][0:1]:
return [A[i][0], A[i][1], sum(A[i][j+2], A[i: ][j+2])]
I get an index error, I believe, because of the A[i: ] and A[i: ][j+2] parts of the code. I don't know how to phrase it though in Python to tell the function to add any other lines that meet the criteria.
Upvotes: 0
Views: 2488
Reputation: 63737
If you are fond of itertools with a little effort, this can easily be solved by playing around with groupby, islice, izip, imap and chain.
And off course you should also remember to use operator.itemgetter
Implementation
# Create a group of lists where the key (the first two elements of the lists) matches
groups = groupby(sorted(l, key = itemgetter(0, 1)), key = itemgetter(0, 1))
# zip the lists and then chop of the first two elements. Sum the elements of the resultant list
# Remember to add the newly accumulated list with the first two elements
groups_sum = ([k, imap(sum, islice(izip(*g), 2, None))] for k, g in groups )
# Reformat the final list to match the output format
[list(chain.from_iterable(elem)) for elem in groups_sum]
Implementation (If you are a fan of single liner)
[list(chain.from_iterable([k, imap(sum, islice(izip(*g), 2, None))]))
for k, g in groupby(sorted(l, key = itemgetter(0, 1)), key = itemgetter(0, 1))]
Sample Input
l = [[10,20,0.1,0.2,0.3,0.4],
[11,22,0.1,0.2,0.3,0.4],
[10,20,0.1,0.2,0.3,0.4],
[11,22,0.1,0.2,0.3,0.4],
[20,30,0.1,0.2,0.3,0.4],
[10,20,0.1,0.2,0.3,0.4]]
Sample Output
[[10, 20, 0.3, 0.6, 0.9, 1.2],
[11, 22, 0.2, 0.4, 0.6, 0.8],
[20, 30, 0.1, 0.2, 0.3, 0.4]]
Dissection
groups = groupby(sorted(l, key = itemgetter(0, 1)), key = itemgetter(0, 1))
# After grouping, similar lists gets clustered together
[((10, 20),
[[10, 20, 0.1, 0.2, 0.3, 0.4],
[10, 20, 0.1, 0.2, 0.3, 0.4],
[10, 20, 0.1, 0.2, 0.3, 0.4]]),
((11, 22), [[11, 22, 0.1, 0.2, 0.3, 0.4], [11, 22, 0.1, 0.2, 0.3, 0.4]]),
((20, 30), [[20, 30, 0.1, 0.2, 0.3, 0.4]])]
groups_sum = ([k, imap(sum, islice(izip(*g), 2, None))] for k, g in groups )
# Each group is accumulated from the second element onwards
[[(10, 20), [0.3, 0.6, 0.9, 1.2]],
[(11, 22), [0.2, 0.4, 0.6, 0.8]],
[(20, 30), [0.1, 0.2, 0.3, 0.4]]]
[list(chain.from_iterable(elem)) for elem in groups_sum]
# Now its just a matter of representing in the output format
[[10, 20, 0.3, 0.6, 0.9, 1.2],
[11, 22, 0.2, 0.4, 0.6, 0.8],
[20, 30, 0.1, 0.2, 0.3, 0.4]]
Upvotes: 1
Reputation: 10884
This is a function that will take a list of lists A
and check internal list i
and j
using your criteria. It will then either return the summed list you want or None
if the first two elements don't match.
def check_internal_ij(A,i,j):
""" checks internal list i against internal list j """
if A[i][0:2] == A[j][0:2]:
new = [x+y for x,y in zip( A[i], A[j] )]
new[0:2] = A[i][0:2]
return new
else:
return None
Then you can run the function over all combinations of internal lists you want to check.
Upvotes: 1
Reputation: 12092
This is one way to do it:
>>> a_list = [[12411.0, 31937.0, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.15, 0.1, 0.15, 0.2, 0.1, 0.15, 0.15, 0.15, 0.15], [12411.0, 31937.0, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1]]
>>> result = [a + b for a, b in zip(*a_list)]
>>> result[:2] = a_list[0][:2]
>>> result
[12411.0, 31937.0, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.25, 0.2, 0.25, 0.30000000000000004, 0.2, 0.25, 0.25, 0.25, 0.25]
This works by blindly adding up corresponding elements in all the sub-lists by doing:
[a + b for a, b in zip(*a_list)]
And then rewriting the first two elements of the result which according to the question does not change, by doing:
result[:2] = a_list[0][:2]
It is not evident from your question, as to what should the behavior be if the first two elements of the sub lists do not match. But the following snippet will help you check if the first two elements of the sub lists match. Lets assume a_list
contains sublists whose first two elements do not match:
>>> a_list = [[12411.0, 31937.0, 0.1, 0.1], [12411.3, 31937.0, 0.1, 0.1]]
then, this condition:
all([True if list(a)[1:] == list(a)[:-1] else False for a in list(zip(*a_list))[:2]])
will return False
. True
otherwise. The code extracts the first elements and second elements of all the sub lists and then checks if they are equal.
You can include the above check in your code and modify your code accordingly for the expected behavior.
To sum it up:
a_list = [[12411.0, 31937.0, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.15, 0.1, 0.15, 0.2, 0.1, 0.15, 0.15, 0.15, 0.15], [12411.0, 31937.0, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1]]
check = all([True if list(a)[1:] == list(a)[:-1] else False for a in list(zip(*a_list))[:2]])
result = []
if check:
result = [a + b for a, b in zip(*a_list)]
result[:2] = a_list[0][:2]
else:
# whatever the behavior should be.
Upvotes: 3
Reputation: 94881
Here's a function that will merge all sublists where the first two entries match. It also handles cases where the sub-lists are not the same length:
from itertools import izip_longest
l = [[1,3,4,5,6], [1,3,2,2,2], [2,3,5,6,6], [1,1,1,1,1], [1,1,2,2,2], [1,3,6,2,1,1,2]]
l2 = [[12411.0, 31937.0, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.15, 0.1, 0.15, 0.2, 0.1, 0.15, 0.15, 0.15, 0.15], [12411.0, 31937.0, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1]]
def merge(l):
d = {}
for ent in l:
key = tuple(ent[0:2])
merged = d.get(key, None)
if merged is None:
d[key] = ent
else:
merged[2:] = [a+b for a,b in izip_longest(merged[2:], ent[2:], fillvalue=0)]
return d.values()
print merge(l)
print merge(l2)
Output:
[[1, 3, 12, 9, 9, 1, 2], [2, 3, 5, 6, 6], [1, 1, 3, 3, 3]]
[[12411.0, 31937.0, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.25, 0.2, 0.25, 0.30000000000000004, 0.2, 0.25, 0.25, 0.25, 0.25]]
It's implemented by maintaining a dict where the keys are the first two entries of a sub-list (stored as a tuple). As we iterate over the sublists, we check to see if there's an entry in the dict. If there isn't, we store the current sublist in the dict. If there already is an entry, we add up all their values from index 2 onward, and update the dict. Once we're one iterating, we just return all the values from the dict.
Upvotes: 3