komarkovich
komarkovich

Reputation: 2319

Python list group and sum with more fields

I have a list with two integer fields which I would like to sum (string,integer, integer)

myList= [[["26-07-2017",2,0], ["26-07-2017",3,0], ["27-07-2017",1,0], ["27-07-2017",0,1]]]

Now I would like to group by date and sum the int fields. So the output should be like this:

sumList= [[["26-07-2017",5,0], ["27-07-2017",1,1]]]

How can I accomplish this? Thank you for the answer.

Upvotes: 0

Views: 1683

Answers (4)

Moses Koledoye
Moses Koledoye

Reputation: 78556

You can use itertools.groupby to group the items on the date, then use reduce to sum numbers in each group:

from itertools import groupby

lst = [[k] + reduce(lambda x, y: [y[1]+x[1], y[2]+x[2]], g) 
                          for k, g in groupby(myList[0], lambda x: x[0])]
print [lst]
# [[['26-07-2017', 5, 0], ['27-07-2017', 1, 1]]]

Python 3 requires importing reduce: from functools import reduce


You could avoid using the relatively less intuitve reduce (also in submission to GvR) by taking the sums in a for loop:

from itertools import groupby

lst = []
for k, g in groupby(myList[0], lambda x: x[0]):
   g =  [sum(d) for d in zip(*(t[1:] for t in g))]
   lst.append([k] + g)
print [lst]
# [[['26-07-2017', 5, 0], ['27-07-2017', 1, 1]]]

Upvotes: 4

Akshay Kandul
Akshay Kandul

Reputation: 602

You can use dict to store your unique dates and sum of the values

Code:

myList= [[["26-07-2017",2,0], ["26-07-2017",3,0], ["27-07-2017",1,0], ["27-07-2017",0,1]]]
dic = {}
for x in myList[0]:
    try:
        dic[x[0]][0] = dic[x[0]][0]+x[1]
        dic[x[0]][1] = dic[x[0]][1] + x[2]
    except:
        dic[x[0]] = [x[1], x[2]]
[[k,v[0], v[1]]for k,v in dic.items()]

Output:

[['26-07-2017', 5, 0], ['27-07-2017', 1, 1]]

Upvotes: 0

Wesley Bowman
Wesley Bowman

Reputation: 1396

You can probably do this with Pandas

import pandas as pd

df = pd.DataFrame(myList[0])
answer = df.groupby([0]).sum()

gives me

            1  2
0               
26-07-2017  5  0
27-07-2017  1  1

EDIT: I used your list as is above, but with a few modifications, the code makes a bit more sense:

# name the columns
df = pd.DataFrame(myList[0], columns=['date', 'int1', 'int2'])

# group on the date column
df.groupby(['date']).sum()

returns

            int1  int2
date                  
26-07-2017     5     0
27-07-2017     1     1

and the dataframe looks like:

         date  int1  int2
0  26-07-2017     2     0
1  26-07-2017     3     0
2  27-07-2017     1     0
3  27-07-2017     0     1

Upvotes: 2

Davy M
Davy M

Reputation: 1696

I would use a dictionary to keep track of like first entries, as so:

my_dict = {}
for entry in myList:
    if entry[0] not in my_dict:
        #This makes my_dict hold dates as keys and a list of 2 integers as values
        my_dict[entry[0]] = [entry[1:]]
    else:
        #In the case that the date is already in my_dict, add the new integers
        my_dict[entry[0]][0] += entry[1]
        my_dict[entry[0]][1] += entry[2]
#Now my_dict holds dates as keys with all the sums following
#If I really need it to be in the list format you asked for:
sumList = []
for value in my_dict:
    sumList.append(value, my_dict[value][0], my_dict[value][1])

Upvotes: 0

Related Questions