urgeo
urgeo

Reputation: 641

group list elements by their difference with respect to each other

Given a list

A = [1, 6, 13, 15, 17, 18, 19, 21, 29, 36, 53, 58, 59, 61, 63, 78, 79, 81, 102, 114]

is there an easy way to group all those clusters where the difference between successive elements is smaller than 3?

That is, obtain something like:

[13, 15, 17, 19, 21], [58, 59, 61, 63], [78, 79, 81]

I was wondering whether there exists any built-in function, but I couldn't find anything similar. I was trying to figure it out using the groupby of the itertools, but I'm stuck. Thank you in advance.

Upvotes: 6

Views: 1346

Answers (4)

Ajax1234
Ajax1234

Reputation: 71461

You can use itertools.groupby:

import itertools
A = [1, 6, 13, 15, 17, 18, 19, 21, 29, 36, 53, 58, 59, 61, 63, 78, 79, 81, 102, 114]
new_a = [(A[i+1]-A[i], A[i]) for i in range(len(A)-1)]
a = [[a, [c for _, c in b]] for a, b in itertools.groupby(new_a, key=lambda x:x[0] < 3)]
final_groups = [a[i][-1]+[a[i+1][-1][0]] if a[i+1][-1][0] - a[i][-1][-1] < 3 else a[i][-1] for i in range(len(a)-1) if a[i][0]]

Output:

[[13, 15, 17, 18, 19, 21], [58, 59, 61, 63], [78, 79, 81]]

Upvotes: 2

sytech
sytech

Reputation: 40941

Based on your comment

The important thing is to "store elements until" the condition is satisfied

You could use itertools.takewhile for this.

takewhile(predicate, iterable) --> takewhile object

Return successive entries from an iterable as long as the predicate evaluates to true for each entry.

This solution certainly has room for improvement, but the takeaway is the use of takewhile

class Grouper:
    """simple class to perform comparison when called, storing last element given"""
    def __init__(self, diff):
        self.last = None
        self.diff = diff
    def predicate(self, item):
        if self.last is None:
            return True
        return abs(self.last - item) < self.diff
    def __call__(self, item):
        """called with each item by takewhile"""
        result = self.predicate(item)
        self.last = item
        return result


def group_by_difference(items, diff=3):
    results = []
    start = 0
    remaining_items = items
    while remaining_items:
        g = Grouper(diff)
        group = [*itertools.takewhile(g, remaining_items)]
        results.append(group)
        start += len(group)
        remaining_items = items[start:]
    return results

This gives you grouped items with singleton clusters.

[[1],
 [6],
 [13, 15, 17, 18, 19, 21],
 [29],
 [36],
 [53],
 [58, 59, 61, 63],
 [78, 79, 81],
 [102],
 [114]]

Upvotes: 1

Graipher
Graipher

Reputation: 7186

Similarly to this answer, which asked for runs of the same number, one can use numpy.split here:

import numpy as np

def plateaus(A, atol=3):
    runs = np.split(A, np.where(np.abs(np.diff(A)) >= atol)[0] + 1)
    return [list(x) for x in runs if len(x) > 1]

A = [1, 6, 13, 15, 17, 18, 19, 21, 29, 36, 53, 58, 59, 61, 63, 78, 79, 81, 102, 114]
print(plateaus(A))
[[13, 15, 17, 18, 19, 21], [58, 59, 61, 63], [78, 79, 81]]

Without filtering on the length, this gives you singleton clusters just like the itertools.takehwile approach by @sytech.

Upvotes: 1

Rakesh
Rakesh

Reputation: 82765

This is one approach using an iteration.

Ex:

A = [1, 6, 13, 15, 17, 18, 19, 21, 29, 36, 53, 58, 59, 61, 63, 78, 79, 81, 102, 114]
res = []
temp = []
l = len(A)-1

for i,v in enumerate(A):
    if i+1 > l:
        break

    if abs(v - A[i+1]) < 3:
        temp.append(v)
    else:
        if temp:
            temp.append(v)
            res.append(temp)
            temp = []
print(res)

Output:

[[13, 15, 17, 18, 19, 21], [58, 59, 61, 63], [78, 79, 81]]

Upvotes: 1

Related Questions