Reputation: 19
I have clustered 43574 time series using EM clusterer. The output is 24 clusters. I have some questions here. First, is it practically useful to deal with 24 clusters? Isn't it too much? If I am passing the results to neurosurgeon labelling these clusters for the purpose of management of the patient is that going to work? My most important question is that as shown below couple of clusters have 0% likelihood?! what does that mean? Then why they are in different clusters? Any help would be greatly appreciated, And this is what I got:
0 1892 ( 4%) 1 5153 ( 12%) 2 1594 ( 4%) 3 1221 ( 3%) 4 122 ( 0%) 5 2714 ( 6%) 6 7092 ( 16%) 7 141 ( 0%) 8 166 ( 0%) 9 464 ( 1%) 10 3331 ( 8%) 11 4316 ( 10%) 14 2411 ( 6%) 15 2573 ( 6%) 17 3063 ( 7%) 18 142 ( 0%) 19 4211 ( 10%) 20 925 ( 2%) 21 2038 ( 5%) 22 5 ( 0%)
Upvotes: -1
Views: 450
Reputation: 77495
These values are not likelihoods, but size.
data=array([1892, 5153, 1594, 1221, 122, 2714, 7092, 141, 166,
464, 3331, 4316, 2411, 2573, 3063, 142, 4211, 925, 2038, 5])
for f in data * 100. / sum(data): print "%.1f%%" % f,
yields the following relative cluster sizes with an additional digit of precision:
4.3% 11.8% 3.7% 2.8% 0.3% 6.2% 16.3% 0.3% 0.4% 1.1% 7.6% 9.9%
5.5% 5.9% 7.0% 0.3% 9.7% 2.1% 4.7% 0.0%
These are not likelihoods. It's cluster size / data set size.
Upvotes: 0