user3017335
user3017335

Reputation: 285

find duplicates within list

I have got a list containing values like the ones below

h = [
    ('red', array([50, 344])),
    ('blue', array([15, 55])),
    ('green', array([1, 1])),
    ('orange', array([3, 7])),
    ('red', array([1, 1]))
]

I want to loop through the list and sum the numpy.arrays if the labels are the same. So given the example above since there are two 'red' instances the desired outcome would be the same list but

('red', array([ 50, 344])) + ('red', array([1, 1])) = ('red', array([51, 345]))

I have tried to have a nested loop like

for i in range(0, len(h)):
    for p in range(0, len(h)):
        if (h[i][0] == h[p][0]):
            A = h[i][1] + h[p][1]

However, this code sums also the value of the instance h[i][0] with itself but I don't want that. I want to do the following - for each instance if the label of other instances without myself is the same then add them to me without adding my value to myself. I hope that's clear

Upvotes: 4

Views: 85

Answers (2)

jonrsharpe
jonrsharpe

Reputation: 121955

I would recommend using a dictionary to do this:

out = {}
for colour, array_ in h:
     if colour in out:
         out[colour] += array_
     else:
         out[colour] = array_

You can then get the list back as out.items(), which gives me.

[('blue', array([15, 55])), ('orange', array([3, 7])), 
 ('green', array([1, 1])), ('red', array([ 51, 345]))]

for your example.

This single loop will be more efficient than the double loop you have now, as it processes each item in the list only once.

Upvotes: 4

Mathias711
Mathias711

Reputation: 6658

You can add a line

if i==p: 
  continue

Furthermore, the 0, in the range function can be omitted, as it always starts from 0.

Upvotes: 0

Related Questions