Ecko
Ecko

Reputation: 115

How to group similar numbers with condition/range

the idea is to group same/similar float numbers, excluding drastic differences.

For example group 1.123 and 1.123, 1.322 also consider 2.01 as being in one group (additional condition is required to take into account +1 or -1).

code:

from itertools import groupby

x =[39.5999755859375,48.84002685546875,58.08001708984375,67.32000732421875,76.55999755859375,85.79998779296875,147.83999633789062,147.95999145507812,147.95999145507812,147.95999145507812,147.95999145507812,148.07998657226562,147.95999145507812,147.95999145507812,147.95999145507812,147.95999145507812,199.07998657226562,199.07998657226562,199.07998657226562,199.07998657226562,199.07998657226562]

groups = [list(g) for _, g in groupby(x, key=int)]

output:

groups
Out[129]: 
[[39.5999755859375],
[48.84002685546875],
[58.08001708984375],
[67.32000732421875],
[76.55999755859375],
[85.79998779296875],
[147.83999633789062,
 147.95999145507812,
 147.95999145507812,
 147.95999145507812,
 147.95999145507812],
[148.07998657226562],
[147.95999145507812,
 147.95999145507812,
 147.95999145507812,
 147.95999145507812],
[199.07998657226562,
 199.07998657226562,
 199.07998657226562,
 199.07998657226562,
 199.07998657226562]]

As you can see in the output [148.07998657226562] is considered as being different instead of being among 147's

Is there any condition or function I could use inside of groupby() as a parameter to explicitly give a condition/ to take into account all those similar numbers such as 147 and 148 (+- 1).

Upvotes: 2

Views: 540

Answers (2)

Pygirl
Pygirl

Reputation: 13349

Using cluser:

import cluster
data =[39.5999755859375,48.84002685546875,58.08001708984375,67.32000732421875,76.55999755859375,85.79998779296875,147.83999633789062,147.95999145507812,147.95999145507812,147.95999145507812,147.95999145507812,148.07998657226562,147.95999145507812,147.95999145507812,147.95999145507812,147.95999145507812,199.07998657226562,199.07998657226562,199.07998657226562,199.07998657226562,199.07998657226562]
cl = cluster.HierarchicalClustering(data, lambda x,y: abs(x-y))
cl.getlevel(1)

[[39.5999755859375],
 [48.84002685546875],
 [58.08001708984375],
 [85.79998779296875],
 [67.32000732421875],
 [76.55999755859375],
 [199.07998657226562,
  199.07998657226562,
  199.07998657226562,
  199.07998657226562,
  199.07998657226562],
 [148.07998657226562,
  147.83999633789062,
  147.95999145507812,
  147.95999145507812,
  147.95999145507812,
  147.95999145507812,
  147.95999145507812,
  147.95999145507812,
  147.95999145507812,
  147.95999145507812]]

Edit:

Source/Credit: https://stackoverflow.com/a/15801233/6660373

data = sorted(data)

def grouper(iterable):
    prev = None
    group = []
    for item in iterable:
        if not prev or item - prev <= 1:
            group.append(item)
        else:
            yield group
            group = [item]
        prev = item
    if group:
        yield group

list(grouper(data))

[[39.5999755859375],
 [48.84002685546875],
 [58.08001708984375],
 [67.32000732421875],
 [76.55999755859375],
 [85.79998779296875],
 [147.83999633789062,
  147.95999145507812,
  147.95999145507812,
  147.95999145507812,
  147.95999145507812,
  147.95999145507812,
  147.95999145507812,
  147.95999145507812,
  147.95999145507812,
  148.07998657226562],
 [199.07998657226562,
  199.07998657226562,
  199.07998657226562,
  199.07998657226562,
  199.07998657226562]]

Upvotes: 2

Roy Cohen
Roy Cohen

Reputation: 1570

You can always roll your own algorithem. For every element check if the last group has no elements that are too different to it and if so, add it to the same group, otherwise, add a new group with the element in it.

x = [39.5999755859375,48.84002685546875,58.08001708984375,67.32000732421875,76.55999755859375,85.79998779296875,147.83999633789062,147.95999145507812,147.95999145507812,147.95999145507812,147.95999145507812,148.07998657226562,147.95999145507812,147.95999145507812,147.95999145507812,147.95999145507812,199.07998657226562,199.07998657226562,199.07998657226562,199.07998657226562,199.07998657226562]

groups = []
for item in x:
    if len(groups) == 0:
        groups.append([item])
        continue
    curr_group = groups[-1]
    if all(abs(item - x) < 1 for x in curr_group):
        curr_group.append(item)
    else:
        groups.append([item])

Output:

[[39.5999755859375],
 [48.84002685546875],
 [58.08001708984375],
 [67.32000732421875],
 [76.55999755859375],
 [85.79998779296875],
 [147.83999633789062,
  147.95999145507812,
  147.95999145507812,
  147.95999145507812,
  147.95999145507812,
  148.07998657226562,
  147.95999145507812,
  147.95999145507812,
  147.95999145507812,
  147.95999145507812],
 [199.07998657226562,
  199.07998657226562,
  199.07998657226562,
  199.07998657226562,
  199.07998657226562]]

We can generalize this answer by turning this into a function:

from typing import TypeVar, Iterable, Callable

T = TypeVar('T')
def continuous_groupby(iter: Iterable[T], predicate: Callable[[list[T], T], bool]) -> list[list[T]]:
    """Divide the iterable into groups, uses the predicate to decide whether the next value should be considered in the same group or in a new one."""
    groups = []
    for item in iter:
        if len(groups) == 0:
            groups.append([item])
            continue
        curr_group = groups[-1]
        if predicate(curr_group, item):
            curr_group.append(item)
        else:
            groups.append([item])
    return groups

x = [39.5999755859375,48.84002685546875,58.08001708984375,67.32000732421875,76.55999755859375,85.79998779296875,147.83999633789062,147.95999145507812,147.95999145507812,147.95999145507812,147.95999145507812,148.07998657226562,147.95999145507812,147.95999145507812,147.95999145507812,147.95999145507812,199.07998657226562,199.07998657226562,199.07998657226562,199.07998657226562,199.07998657226562]
groups = continuous_groupby(x, lambda group, item: all(abs(item - x) < 1 for x in group))

Note for earlier versions of python, list[T] won't work. Instead, import List from typing and and use List[T].

Upvotes: 1

Related Questions