ASking
ASking

Reputation: 53

Group data by a tolerance

I have an ordered list

L = [301.148986835, 301.148986835, 301.148986835, 301.161562835, 301.161562835, 301.16156333500004, 301.167179835, 301.167179835, 301.167179835, 301.167179835, 301.167179835, 301.179755835, 301.179755835, 301.179755835, 301.646611835, 301.659187335, 301.659187335, 301.659187335, 301.659187335, 302.138619335, 302.142316335, 302.151194835, 302.1568118349999, 302.15681183500004, 302.15681183500004, 302.15681183500004, 302.156812335, 302.156812335, 302.156812335, 302.169387835, 302.169387835, 302.169387835, 302.169387835, 302.169387835, 302.169388335, 302.636243335, 302.636243835, 302.648819835, 302.648819835, 303.137565335, 303.140827335, 303.140827335, 303.146443835, 303.146443835, 303.146444335, 303.159019835, 303.159019835, 303.15901983500004, 303.159020335, 303.159020335, 303.15902033500004, 303.63283533500004, 303.638451335, 304.130459335, 304.130459335, 304.14370483499994, 304.14370483499994, 304.14370483499994, 304.148651835, 304.148652335, 304.148652335]

I want to group it with a margin of +-0.5

The expected output

 R = [[301.148986835,
  301.148986835,
  301.148986835,
  301.161562835,
  301.161562835,
  301.16156333500004,
  301.167179835,
  301.167179835,
  301.167179835,
  301.167179835,
  301.167179835,
  301.179755835,
  301.179755835,
  301.179755835,
  301.646611835,
  301.659187335,
  301.659187335,
  301.659187335,
  301.659187335,
  302.138619335],[302.142316335,
  302.151194835,
  302.1568118349999,
  302.15681183500004,
  302.15681183500004,
  302.15681183500004,
  302.156812335,
  302.156812335,
  302.156812335,
  302.169387835,
  302.169387835,
  302.169387835,
  302.169387835,
  302.169387835,
  302.169388335,
  302.636243335,
  302.636243835,
  302.648819835,
  302.648819835,
  303.137565335,
  303.140827335,
  303.140827335,
  303.146443835,
  303.146443835,
  303.146444335,
  303.159019835,
  303.159019835,
  303.15901983500004,
  303.159020335,
  303.159020335,
  303.15902033500004],
[303.63283533500004,
  303.638451335,
  304.130459335,
  304.130459335,
  304.14370483499994,
  304.14370483499994,
  304.14370483499994],[304.148651835,
  304.148652335,
  304.148652335]

When I use this code (my question is not duplicate

def grouper(iterable):
    prev = None
    group = []
    for item in iterable:
        if prev is None or item - prev <= 1:
            group.append(item)
        else:
            yield group
            group = [item]
        prev = item
    if group:
        yield group

I get the same list as an output

calculate within a tolerance

Upvotes: 1

Views: 45

Answers (1)

pho
pho

Reputation: 25479

You update prev in every iteration. Because of this, every element of your list is within 1 of prev. You want to update it only when you start a new group.

Better yet, get rid of prev altogether and always compare against the first element of the group.

I'd also suggest including a tol argument so that the function is more flexible:

def grouper(iterable, tol=0.5):
    tol = abs(tol*2) # Since we're counting from the start of the group, multiply tol by 2
    group = []
    for item in iterable:
        if not group or item - group[0] <= tol:
            group.append(item)
        else:
            yield group
            group = [item]
    if group:
        yield group

Try it online

Upvotes: 1

Related Questions