peixe
peixe

Reputation: 1292

Group data into specified intervals satisfying certain condition

I'd like to sort into new lists those items in this list...

truc = [['12', 'brett', 5548],
       ['22.3', 'troy', 9514],
       ['8.1', 'hings', 12635],
       ['34.2', 'dab', 17666],
       ['4q3', 'sigma', 18065],
       ['4q3', 'delta', 18068]]

... grouping them using the last field, into bins of size 3500 So, the ideal result would be this:

firstSort = [['34.2', 'dab', 17666],
            ['4q3', 'sigma', 18065],
            ['4q3', 'delta', 18068]]

secondSort = [['22.3', 'troy', 9514],
             ['8.1', 'hings', 12635]]

lastSort = ['12', 'brett', 5548]

I tried to use the itertools.groupby() function, but i am not capable of find a way to specify the bin size.

Upvotes: 0

Views: 797

Answers (3)

Ashwini Chaudhary
Ashwini Chaudhary

Reputation: 251021

using defaultdict():

lis=[['12', 'brett', 5548],
      ['22.3', 'troy', 9514],
      ['8.1', 'hings', 12635],
      ['34.2', 'dab', 17666],
      ['4q3', 'sigma', 18065],
      ['4q3', 'delta', 18068]]

from collections import defaultdict
d=defaultdict(list)
for i,x in enumerate(lis):
    not_append=True
    for y in d:
        for z in d[y]:
            if abs(z[-1]-x[-1])<=3500:
                d[y].append(x)
                not_append=False
                break
    else:
        if not_append:
            d[i].append(x)
print d.values()

output:

[[['12', 'brett', 5548]],
 [['22.3', 'troy', 9514], ['8.1', 'hings', 12635]], 
 [['34.2', 'dab', 17666], ['4q3', 'sigma', 18065], ['4q3', 'delta', 18068]]
]

Upvotes: 0

LSerni
LSerni

Reputation: 57418

A basic binner with groupby:

from itertools import groupby
from math import floor

# data must be sorted

data = [ ['12', 'brett', 5548],
       ['22.3', 'troy', 9514],
       ['8.1', 'hings', 12635],
       ['34.2', 'dab', 17666],
       ['4q3', 'sigma', 18065],
       ['4q3', 'delta', 18068] ]

groups = []
for k, g in groupby(data, lambda x: floor(x[-1]/3500)):
    groups.append(list(g))

print groups

Returns:

[
    [
        ['12', 'brett', 5548]
    ],
    [
        ['22.3', 'troy', 9514]
    ],
    [
        ['8.1', 'hings', 12635]
    ],
    [
        ['34.2', 'dab', 17666],
        ['4q3', 'sigma', 18065],
        ['4q3', 'delta', 18068]
    ]
]

You can then coalesce the groups when the maximum of one group less the minimum of the group before turns out to be less than 3500. You would then get,

[
    [
        ['12', 'brett', 5548]
    ],
    [
        ['22.3', 'troy', 9514],
        ['8.1', 'hings', 12635]
    ],
    [
        ['34.2', 'dab', 17666],
        ['4q3', 'sigma', 18065],
        ['4q3', 'delta', 18068]
    ]
]

Even with coalescing after the groupby, I think that Anurag Uniyal's solution would still achieve better grouping in the average case.

Upvotes: 1

Anurag Uniyal
Anurag Uniyal

Reputation: 88787

This is trivial to do without itertools

truc = [['12', 'brett', 5548],
       ['22.3', 'troy', 9514],
       ['8.1', 'hings', 12635],
       ['34.2', 'dab', 17666],
       ['4q3', 'sigma', 18065],
       ['4q3', 'delta', 18068]]

truc.sort(key=lambda a:a[-1])
groups = [[]]
last_row = None
for row in truc:
    if last_row is not None and row[-1] - last_row[-1] > 3500:
        groups.append([])
    last_row = row
    groups[-1].append(row)

import pprint
pprint.pprint(groups)

Output:

[[['12', 'brett', 5548]],
 [['22.3', 'troy', 9514], ['8.1', 'hings', 12635]],
 [['34.2', 'dab', 17666], ['4q3', 'sigma', 18065], ['4q3', 'delta', 18068]]]

Upvotes: 3

Related Questions