pdubois
pdubois

Reputation: 7790

Simple clustering from list of list in Python

I have the following list of lists that contains 5 entries:

my_lol = [['a', 1.01], ['x',1.00],['k',1.02],['p',3.00], ['b', 3.09]]

I'd like to 'cluster' the above list following roughly this:

1. Sort `my_lol` with respect to the value in the list ascending
2. Pick the lowest entry in `my_lol` as the key of first cluster
3. Calculate the value difference of the current entry with the previous one
4. If the difference is less than the threshold, include that as the member cluster of the first
entry, otherwise assign the current key as the key of the next cluster. 
5. Repeat the rest until finish

At the end of the day I'd like to get the following dictionary of lists:

dol = {'x':['x','a','k'], 'p':['p','b']}

Essentially that dictionary of lists is a cluster that contains two clusters.

I tried this but got stuck from step 3. What's the right way to do it?

import operator
import json
from collections import defaultdict

my_lol = [['a', 1.01], ['x',1.00],['k',1.02],['p',3.00], ['b', 3.09]]
my_lol_sorted = sorted(my_lol, key=operator.itemgetter(1))

thres = 0.1
tmp_val = 0
tmp_ids = "-"

dol = defaultdict(list)
for ids, val in my_lol_sorted:
    if tmp_ids != "-":
        diff = abs(tmp_val - val)

        if diff < thres:
            print tmp_ids
            dol[tmp_ids].append(tmp_ids)

    tmp_ids = ids
    tmp_val = val

print json.dumps(dol, indent=4)

Upvotes: 2

Views: 527

Answers (2)

Ethan Furman
Ethan Furman

Reputation: 69021

import operator
import json
from collections import defaultdict

my_lol = [['a', 1.01], ['x',1.00],['k',1.02],['p',3.00], ['b', 3.09]]
my_lol_sorted = sorted(my_lol, key=operator.itemgetter(1))

thres = 0.1
tmp_val = 0
tmp_ids = "-"

dol = defaultdict(list)
for ids, val in my_lol_sorted:
    if tmp_ids == "-":
        tmp_ids = ids
    else:
        diff = abs(tmp_val - val)
        if diff > thres:
            tmp_ids = ids
    dol[tmp_ids].append(ids)
    tmp_val = val

print json.dumps(dol, indent=4)

Upvotes: 1

irrelephant
irrelephant

Reputation: 4111

Try this:

dol = defaultdict(list)
if len(my_lol) > 0:
    thres = 0.1
    tmp_ids, tmp_val = my_lol_sorted[0]

    for ids, val in my_lol_sorted:
        diff = abs(tmp_val - val)
        if diff > thres:
            tmp_ids = ids
        dol[tmp_ids].append(ids)
        tmp_val = val

Upvotes: 1

Related Questions