Evan
Evan

Reputation: 311

Clustering algorithm for grouping based on y distance from 0

To build a supervised learning model, I have taken daily st.dev of the variable of interest. I would like to find clusters of daily st.dev, i.e., group 1 with smallest st.dev, group 2 with bigger, etc etc.

The result of the clustering will provide the categorical labels for a CART algorithm. It is suspected there are 4 classes.

I have a 2d matrix with dates 'X', and the daily st.dev 'y-true'. After converting the date column into a numeric:

mat.X = pd.to_numeric(mat['X']) 

Using k-means in the sklearn lib, this is the result:

kmeans = KMeans(n_clusters=3)
kmeans = kmeans.fit(mat)]
labels = kmeans.predict(mat)
plt.scatter(mat[:,0],mat[:,1], c=kmeans.labels_, cmap='rainbow')  

enter image description here

The results did not cluster the st.dev data of the Y axis. Is this a good methodology? Should the columns be switched to cluster the st.dev data?

Upvotes: 1

Views: 126

Answers (1)

Bert Kellerman
Bert Kellerman

Reputation: 1629

You say you want to cluster only on std-dev, but you are clustering on two dimensions, the std-dev and the date.

Try this.

kmeans = KMeans(n_clusters=3)
kmeans = kmeans.fit(mat[:, 1])] 
labels = kmeans.predict(mat[:, 1])
plt.scatter(mat[:,0],mat[:,1], c=labels, cmap='rainbow')

Upvotes: 1

Related Questions