Reputation: 7790
I have the following list of lists that contains 5 entries:
my_lol = [['a', 1.01], ['x',1.00],['k',1.02],['p',3.00], ['b', 3.09]]
I'd like to 'cluster' the above list following roughly this:
1. Sort `my_lol` with respect to the value in the list ascending
2. Pick the lowest entry in `my_lol` as the key of first cluster
3. Calculate the value difference of the current entry with the previous one
4. If the difference is less than the threshold, include that as the member cluster of the first
entry, otherwise assign the current key as the key of the next cluster.
5. Repeat the rest until finish
At the end of the day I'd like to get the following dictionary of lists:
dol = {'x':['x','a','k'], 'p':['p','b']}
Essentially that dictionary of lists is a cluster that contains two clusters.
I tried this but got stuck from step 3. What's the right way to do it?
import operator
import json
from collections import defaultdict
my_lol = [['a', 1.01], ['x',1.00],['k',1.02],['p',3.00], ['b', 3.09]]
my_lol_sorted = sorted(my_lol, key=operator.itemgetter(1))
thres = 0.1
tmp_val = 0
tmp_ids = "-"
dol = defaultdict(list)
for ids, val in my_lol_sorted:
if tmp_ids != "-":
diff = abs(tmp_val - val)
if diff < thres:
print tmp_ids
dol[tmp_ids].append(tmp_ids)
tmp_ids = ids
tmp_val = val
print json.dumps(dol, indent=4)
Upvotes: 2
Views: 527
Reputation: 69021
import operator
import json
from collections import defaultdict
my_lol = [['a', 1.01], ['x',1.00],['k',1.02],['p',3.00], ['b', 3.09]]
my_lol_sorted = sorted(my_lol, key=operator.itemgetter(1))
thres = 0.1
tmp_val = 0
tmp_ids = "-"
dol = defaultdict(list)
for ids, val in my_lol_sorted:
if tmp_ids == "-":
tmp_ids = ids
else:
diff = abs(tmp_val - val)
if diff > thres:
tmp_ids = ids
dol[tmp_ids].append(ids)
tmp_val = val
print json.dumps(dol, indent=4)
Upvotes: 1
Reputation: 4111
Try this:
dol = defaultdict(list)
if len(my_lol) > 0:
thres = 0.1
tmp_ids, tmp_val = my_lol_sorted[0]
for ids, val in my_lol_sorted:
diff = abs(tmp_val - val)
if diff > thres:
tmp_ids = ids
dol[tmp_ids].append(ids)
tmp_val = val
Upvotes: 1