Reputation: 61

Remove the duplicate values and sum the corresponding column values

I have a list from which I need to remove the duplicate values and sum the corresponding column values. The list is:

lst = [['20150815171000', '1', '2'],
       ['20150815171000', '2', '3'],
       ['20150815172000', '3', '4'],
       ['20150815172000', '4', '5'],
       ['20150815172000', '5', '6'],
       ['20150815173000', '6', '7']]

Now I need to traverse through the list and get the output something like this:

lst2 = [['20150815171000', '3', '5'], 
        ['20150815172000', '12', '15'], 
        ['20150815173000', '6', '7']]

How could this be done? I have tried writing the code as shown below but it's just comparing to consecutive values not, not all the matched ones.

    lst2 = []
    ws = wr = power = 0
    for i in range(len(lst)):
        if lst[i][0] == lst[i+1][0]:
            time = lst[i][0]
            ws = (float(lst[i][1])+float(lst[i+1][1]))
            wr = (float(lst[i][2])+float(lst[i+1][2]))      
        else:
           time = lst[i][0]
           ws = lst[i][1]
           wr = lst[i][2]
        lst2.append([time, ws, wr, power])

Can anyone let me know how can I do this?

Upvotes: 6

Answers (5)

Mark Shuster

Reputation: 31

Clean with lambda and sorted() using dictionary. Without additional libraries.

lst = [['20150815171000', '1', '2'],
       ['20150815171000', '2', '3'],
       ['20150815172000', '3', '4'],
       ['20150815172000', '4', '5'],
       ['20150815172000', '5', '6'],
       ['20150815173000', '6', '7']]

dct = dict()
for a, b, c in lst:
    if a not in dct: 
        dct[a] = [b, c] 
    else: 
        dct[a] = map(lambda x, y: str(int(x)+int(y)), dct[a], [b,c])
lst2 = sorted([[k,v[0],v[1]] for k,v in dct.items()])

print(lst2)

Out:

[['20150815171000', '3', '5'], 
['20150815172000', '12', '15'], 
['20150815173000', '6', '7']]

Upvotes: 3

Anzel

Reputation: 20583

Alternatively, I would suggest using pandas, quite straight forward with groupby and sum, here is one way to do it:

In [1]: import pandas as pd

In [2]: df = pd.DataFrame(
[['20150815171000', '1', '2'],
 ['20150815171000', '2', '3'],
 ['20150815172000', '3', '4'],
 ['20150815172000', '4', '5'],
 ['20150815172000', '5', '6'],
 ['20150815173000', '6', '7']],
columns=['group', 'field1', 'field2'])

In [3]: df
Out[3]:
            group field1 field2
0  20150815171000      1      2
1  20150815171000      2      3
2  20150815172000      3      4
3  20150815172000      4      5
4  20150815172000      5      6
5  20150815173000      6      7

# need to convert from '1', '2'... to integer type
In [4]: df['field1'] = df['field1'].astype('int')

In [5]: df['field2'] = df['field2'].astype('int')

# this groupby(to_group_field) and sum() can achieve what you want
In [6]: df.groupby('group').sum()
Out[6]:
                field1  field2
group
20150815171000       3       5
20150815172000      12      15
20150815173000       6       7

# convert to the list of lists format as you expected
In [7]: df.groupby('group').sum().reset_index().values.tolist()
Out[7]:
[['20150815171000', 3, 5],
 ['20150815172000', 12, 15],
 ['20150815173000', 6, 7]]

Hope this helps.

Upvotes: 2

Sparkas

Reputation: 175

Like commented on your question, I would also suggest to use a dictionary for help. I'm not a good programmer and there a certainly better ways, but this works:

dct = dict()
for x, y, z in lst:
    if x not in dct:
        dct[x] = [y, z]
    else:
        dct[x] = [str(int(dct[x][0]) + int(y)), str(int(dct[x][1]) + int(z))]
lst2 = []
for k, v in dct.items():
    lst2.append([k, v[0], v[1]])

You are basically just iterating over the list and, adding a new item to the dictionary if the wanted number (e.g. '2015081517100') doesn't exist yet, else updating the corresponding values. In the end you just create another list out of the results in the dictionary

Upvotes: 1

m00am

Reputation: 6298

You could use a dictionary to manage unique entries in your list. Then you check if a key already contained in the keys of the dict. If the key already is in the dict then add to the present one, otherwise add a new entry to the dict.

Try this:

#!/usr/bin/env python3

sums = dict()
for key, *values in lst:
    try:
        # add to an already present entry in the dict
        sums[key] = [int(x)+y for x, y in zip(values, sums[key])]
    except KeyError:
        # if the entry is not already present add it to the dict
        # and cast the values to int to make the adding easier
        sums[key] = map(int, values)

# build the output list from dictionary
# also cast back the values to strings
lst2 = sorted([[key]+list(map(str, values)) for key, values in sums.items()])

The sorted in the last line might be optional. Depending on whether you need the output list to be sorted by the dict keys or not.

Note that this should work for any length of values after the key.

Upvotes: 3

Anand S Kumar

Reputation: 91007

I would use itertools.groupby , grouping based on the first element on the inner list.

So I would first sort the list based on first element and then group based on it (If the list would already be sorted on that element, then you do not need to sort again , you can directly group) .

Example -

new_lst = []
for k,g in itertools.groupby(sorted(lst,key=lambda x:x[0]) , lambda x:x[0]):
    l = list(g)
    new_lst.append([k,str(sum([int(x[1]) for x in l])), str(sum([int(x[2]) for x in l]))])

Demo -

>>> import itertools
>>>
>>> lst = [['20150815171000', '1', '2'],
...        ['20150815171000', '2', '3'],
...        ['20150815172000', '3', '4'],
...        ['20150815172000', '4', '5'],
...        ['20150815172000', '5', '6'],
...        ['20150815173000', '6', '7']]
>>>
>>> new_lst = []
>>> for k,g in itertools.groupby(sorted(lst,key=lambda x:x[0]) , lambda x:x[0]):
...     l = list(g)
...     new_lst.append([k,str(sum([int(x[1]) for x in l])), str(sum([int(x[2]) for x in l]))])
...
>>> new_lst
[['20150815171000', '3', '5'], ['20150815172000', '12', '15'], ['20150815173000', '6', '7']]

Upvotes: 6

Remove the duplicate values and sum the corresponding column values

Answers (5)

Related Questions