Reputation: 115
The idea is to group or cluster similar/same numbers as being in one group (or list) while other drastically different float numbers should be in a different group. If there are no similar/same float numbers it should be separate.
code1:
from itertools import groupby
x =[39.5999755859375,48.84002685546875,58.08001708984375,67.32000732421875,76.55999755859375,85.79998779296875,147.83999633789062,147.95999145507812,147.95999145507812,147.95999145507812,147.95999145507812,148.07998657226562,147.95999145507812,147.95999145507812,147.95999145507812,147.95999145507812,199.07998657226562,199.07998657226562,199.07998657226562,199.07998657226562,199.07998657226562]
groups = [list(g) for _, g in groupby(x, key=int)]
output1:
groups
Out[129]:
[[39.5999755859375],
[48.84002685546875],
[58.08001708984375],
[67.32000732421875],
[76.55999755859375],
[85.79998779296875],
[147.83999633789062,
147.95999145507812,
147.95999145507812,
147.95999145507812,
147.95999145507812],
[148.07998657226562],
[147.95999145507812,
147.95999145507812,
147.95999145507812,
147.95999145507812],
[199.07998657226562,
199.07998657226562,
199.07998657226562,
199.07998657226562,
199.07998657226562]]
Here what is good about the output is that it preserves the order of the float numbers, however, what is wrong is that for instance [148.07998657226562] is not considered among 147's (e.g 147.83999633789062,147.95999145507812).
code2 attempt to cluster:
import cluster
data =[39.5999755859375,48.84002685546875,58.08001708984375,67.32000732421875,76.55999755859375,85.79998779296875,147.83999633789062,147.95999145507812,147.95999145507812,147.95999145507812,147.95999145507812,148.07998657226562,147.95999145507812,147.95999145507812,147.95999145507812,147.95999145507812,199.07998657226562,199.07998657226562,199.07998657226562,199.07998657226562,199.07998657226562]
cl = cluster.HierarchicalClustering(data, lambda x,y: abs(x-y))
cl.getlevel(1)
output2:
[[39.5999755859375],
[48.84002685546875],
[58.08001708984375],
[85.79998779296875],
[67.32000732421875],
[76.55999755859375],
[199.07998657226562,
199.07998657226562,
199.07998657226562,
199.07998657226562,
199.07998657226562],
[148.07998657226562,
147.83999633789062,
147.95999145507812,
147.95999145507812,
147.95999145507812,
147.95999145507812,
147.95999145507812,
147.95999145507812,
147.95999145507812,
147.95999145507812]]
In this case, what is good about the output is its clustering. What's is wrong is that the order is altered.
The reason why the order is important is because these numbers are representing coordinates and it's a sequence that is already sorted previously (x variable). this any additional sorting beforehand or after alters the original (current x var) order.
The reason why it's so important to have the same order is because of its exportation order.
desired output:
[[39.5999755859375],
[48.84002685546875],
[58.08001708984375],
[67.32000732421875],
[76.55999755859375],
[85.79998779296875],
[147.83999633789062,
147.95999145507812,
147.95999145507812,
147.95999145507812,
147.95999145507812,
148.07998657226562,
147.95999145507812,
147.95999145507812,
147.95999145507812,
147.95999145507812],
[199.07998657226562,
199.07998657226562,
199.07998657226562,
199.07998657226562,
199.07998657226562]]
Upvotes: 0
Views: 210
Reputation: 664
If you stand by grouping by some function and all you want is to trace back the indexes, you could do something like this:
data =[39.5999755859375,48.84002685546875,58.08001708984375,67.32000732421875,76.55999755859375,85.79998779296875,147.83999633789062,147.95999145507812,147.95999145507812,147.95999145507812,147.95999145507812,148.07998657226562,147.95999145507812,147.95999145507812,147.95999145507812,147.95999145507812,199.07998657226562,199.07998657226562,199.07998657226562,199.07998657226562,199.07998657226562]
# data_out = [list(g) for _, g in groupby(x, key=int)]
data_out = [[39.5999755859375],
[48.84002685546875],
[58.08001708984375],
[85.79998779296875],
[67.32000732421875],
[76.55999755859375],
[199.07998657226562,
199.07998657226562,
199.07998657226562,
199.07998657226562,
199.07998657226562],
[148.07998657226562,
147.83999633789062,
147.95999145507812,
147.95999145507812,
147.95999145507812,
147.95999145507812,
147.95999145507812,
147.95999145507812,
147.95999145507812,
147.95999145507812]]
reindex = []
for d in data:
for i, group in enumerate(data_out):
if d in group:
reindex.append(i)
break
print(reindex)
This returns [0, 1, 2, 4, 5, 3, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 6, 6, 6, 6, 6]
which are group indexes of the original data.
I don't know what the application of your code is, but you probably want a dict next. Since Python 3.6 dicts are insertion ordered, so you can replace your data with dict (data_dict
) that points to which group they are in:
data_dict = dict((d, i) for d, i in zip(data, reindex))
returns:
{39.5999755859375: 0, 48.84002685546875: 1, 58.08001708984375: 2, 67.32000732421875: 4, 76.55999755859375: 5, 85.79998779296875: 3, 147.83999633789062: 7, 147.95999145507812: 7, 148.07998657226562: 7, 199.07998657226562: 6}
Maybe I misunderstood and you want the numbers to be ordered as they are in the input?
If so, you can get it from data_dict
as simply as:
group_count = max(data_dict.values()) + 1
data_grouped = [[] for _ in range(group_count)]
for d in data:
data_grouped[data_dict[d]].append(d)
print(data_grouped)
returns
[[39.5999755859375], [48.84002685546875], [58.08001708984375], [85.79998779296875], [67.32000732421875], [76.55999755859375], [199.07998657226562, 199.07998657226562, 199.07998657226562, 199.07998657226562, 199.07998657226562], [147.83999633789062, 147.95999145507812, 147.95999145507812, 147.95999145507812, 147.95999145507812, 148.07998657226562, 147.95999145507812, 147.95999145507812, 147.95999145507812, 147.95999145507812]]
Once again: You are not saying what is your code for.
Note: This is not a very elegant solution.
Upvotes: 0
Reputation: 19405
Your desired output can be achieved by altering the grouping key
to be:
key=lambda f: f//10
But this groups the numbers according to the tenth they are in. So for example 146.56
and 148.2
will also be grouped together. groupby
only looks at each element individually and constructs a key from it. There is no "memory" of previous numbers so if you need some relative grouping you will need to do it manually:
groups = []
group = [x[0]]
for num in x[1:]:
if abs(group[-1] - num) <= 1:
group.append(num)
else:
groups.append(group)
group = [num]
groups.append(group)
Note that this keeps checking according to the last number added to each group. So theoretically, you can have a group of [145.1, 146.0, 147.9, 148.7, ...]
. If that is not desired, you can keep the difference according to a fixed point. Just change
if abs(group[-1] - num) <= 1:
to:
if abs(group[0] - num) <= 1:
Upvotes: 1