Blade1024
Blade1024

Reputation: 86

Understanding clustering

I need a help with clustering here. I need to understand grouping of variables in sets, so I did the following:

  1. I got the data and made sure that it is of a float type
  2. I normalized these values using x = (x - min)/(max - min), where min and max are the variables that indicate min and max in the data range
  3. I used np.array function to convert it to numpy array
  4. Then I am trying to use bandwidth (to use ManShift) or DBSCAN functions to perform the processing, but it traps with the "ValueError: data type not understood" error. What am I'm doing wrong?

Here is the code -

print ('Minimum value is {0}, maximum is {1}'.format(min_value, max_value))
for position in range(0, len(sub_set)):
    sub_set[position] = (sub_set[position] - min_value)/(max_value - min_value)

data = np.array(sub_set)

print (type(data))
print len(data)
bandwidth = estimate_bandwidth(data, quantile=0.2, n_samples=len(data))

Regards Matvey

Upvotes: 0

Views: 154

Answers (1)

Has QUIT--Anony-Mousse
Has QUIT--Anony-Mousse

Reputation: 77454

For one-dimensional data, you gain little by doing clustering. Instead, use kernel density estimation or similar methods that exploit orderedness of your data.

Because meanshift, DBSCAN etc. are designed for multivariate data, they expect a multidimensional matrix, but you were giving a one-dimensional matrix instead. Use reshape to fix this.

In general, learn your APIs - you can use numpy much more effectively, by first converting your data into numpy, then performing all the other operations in vectorized form.

data = numpy.array(data)
data = (data - data.min()) / data.ptp() # Scale to 0:1
dens = gaussian_kde(data).evaluate(data)

will yield density estimates for each point in data. Try splitting your data on local minima of this density estimate.

Have a look at this curve for your data:

plot(gaussian_kde(data).evaluate(np.linspace(data.min(),data.max())))

Does splitting at local minima yield the desired result?

Upvotes: 1

Related Questions