Reputation: 135

Groupby multiple columns in a list

I have a list of list like below

[['H1','L', '1']
['H1','S', '1']
['H2','L', '1']
['H2','L', '1']]

And want grouping based on column1 and column2. Does python provide anything in lists that i can get the below result

H1 L 1
H1 S 1
H2 L 2

Upvotes: 6

Answers (6)

Dan

Reputation: 45741

Another option is to use pandas:

import pandas as pd
df = pd.DataFrame([['H1','L', 1],['H1','S', 1],['H2','L', 1],['H2','L', 1]],columns=['H','LS','1'])
df.groupby(['H','LS']).sum()

returning

>>> df.groupby(['H','LS']).sum().reset_index()
    H LS  1
0  H1  L  1
1  H1  S  1
2  H2  L  2

Upvotes: 1

Sohaib Farooqi

Reputation: 5666

You can use itertools.groupby along with operator.itemgetter to achieve your desired results

>>> from operator import itemgetter
>>> from itertools import groupby

>>> items = [['H1','L', '1'], ['H1','S', '1'], ['H2','L', '1'], ['H2','L', '1']]
>>> [(*k,sum([int(itemgetter(2)(i)) for i in list(g)])) for k,g in groupby(items,key=itemgetter(0,1))]
>>> [('H1', 'L', 1), ('H1', 'S', 1), ('H2', 'L', 2)]

Upvotes: 4

englealuze

Reputation: 1653

You can use hash object to store and look-up. This should be fast.

test=[['H1','L', '1'],
['H1','S', '1'],
['H2','L', '1'],
['H2','L', '1']]

d = {}
for x, y, z in test:
  d[(x, y)] = d.get((x,y), 0) + 1

print(d)
# -> {('H1', 'L'): 1, ('H1', 'S'): 1, ('H2', 'L'): 2}

Upvotes: 0

Skyler

Reputation: 656

Following code works,

items = [['H1','L', '1'],
['H1','S', '1'],
['H2','L', '1'],
['H2','L', '1']]

from collections import defaultdict

dictionary = defaultdict(int)

for item in items:
  dictionary[tuple(item[:2])]+=int(item[2])

for key in dictionary:
  print(key[0], key[1], dictionary[key])

Upvotes: 0

Ghilas BELHADJ

Reputation: 14086

You can use itertools.groupby, and the sum up the last column of each group.

from itertools import groupby

out = []
for k, v in groupby(l, key=lambda x: x[:2]):
    s = sum([int(x[-1]) for x in v])
    out.append(k + [s])

print (out)
# [['H1', 'L', 1], ['H1', 'S', 1], ['H2', 'L', 2]]

Upvotes: 5

Netwave

Reputation: 42678

Use itertools groupby with a custom key taking the columns you need:

groupby(l, key = lambda x: (x[0], x[1]) )

Here you have a live example corresponding to this code:

l = [
  ['H1','L', '1'],
  ['H1','S', '1'],
  ['H2','L', '1'],
  ['H2','L', '1']
]

import itertools as it 


for k, v in it.groupby(l, key = lambda x: (x[0], x[1]) ):
  print(list(v)[0])

result:

['H1', 'L', '1']
['H1', 'S', '1']
['H2', 'L', '1']

Upvotes: 0

Groupby multiple columns in a list

Answers (6)

Related Questions