Reputation: 86
I need a help with clustering here. I need to understand grouping of variables in sets, so I did the following:
"ValueError: data type not understood"
error. What am I'm doing wrong?Here is the code -
print ('Minimum value is {0}, maximum is {1}'.format(min_value, max_value))
for position in range(0, len(sub_set)):
sub_set[position] = (sub_set[position] - min_value)/(max_value - min_value)
data = np.array(sub_set)
print (type(data))
print len(data)
bandwidth = estimate_bandwidth(data, quantile=0.2, n_samples=len(data))
Regards Matvey
Upvotes: 0
Views: 154
Reputation: 77454
For one-dimensional data, you gain little by doing clustering. Instead, use kernel density estimation or similar methods that exploit orderedness of your data.
Because meanshift, DBSCAN etc. are designed for multivariate data, they expect a multidimensional matrix, but you were giving a one-dimensional matrix instead. Use reshape
to fix this.
In general, learn your APIs - you can use numpy
much more effectively, by first converting your data into numpy, then performing all the other operations in vectorized form.
data = numpy.array(data)
data = (data - data.min()) / data.ptp() # Scale to 0:1
dens = gaussian_kde(data).evaluate(data)
will yield density estimates for each point in data. Try splitting your data on local minima of this density estimate.
Have a look at this curve for your data:
plot(gaussian_kde(data).evaluate(np.linspace(data.min(),data.max())))
Does splitting at local minima yield the desired result?
Upvotes: 1