ridctg
ridctg

Reputation: 135

Clustering (x,y) coordinates in Matlab with ellipse around different density

I have some data points saved in CSV file. Odd columns represent X values, and even columns represent Y values. There are about 30 columns and 800 rows. I want to import those data points in Matlab to make a visualization as the image (1st one) attached below. I can go with single color. As the image shows, my points are exactly similar. And I want to have ellipse around the close points and draw at least two ellipse for each set of points, to show different density.

Please help! If my question is not clear, just see the image, I want something like that, but I can use single color. All the points come from a single CSV file.

EDIT: I am using csvread to get points from CSV file. To read column by column I am using the following code. But I think there are more efficient way to do this. As I said, I have 30 columns and about 800 rows.

b = csvread ('C:\Users\Riyadh\Desktop\ThumbTouch All\HeatMaps\Tap 3 Rig R.csv', 1, 0, [1 0 870-1 0 ] )

The second image shows what I have so far.

What I want

What I have

Upvotes: 1

Views: 2601

Answers (2)

Has QUIT--Anony-Mousse
Has QUIT--Anony-Mousse

Reputation: 77454

The image obviously is from Wikipedia: https://en.wikipedia.org/wiki/File:EM-Gaussian-data.svg

This page also says that the image was generated with ELKI; so why don't you try using ELKI instead of Matlab?

For this visualization to make sense, you need to be using EM clustering. I'm pretty sure it already exists in Matlab, and it might actually come with a similar visualization, which is known as contour plot.

However, your data might be too small.

You have 30 dimensions, for covariance matrixes to be statistically sound, you should have at least 3*d*d = 2700 rows, as a rule of thumb. For each cluster, that is. Judging from your plot, you have at least 16clusters, so you should have around 50000 points. Or much fewer dimensions.

Last but not least, I don't know if the contour plot makes any sense beyond 2 dimensions (maybe 3 dimensions if you have a 3D visualization)

Upvotes: 2

venergiac
venergiac

Reputation: 7717

But the columns are .... ? I suppose cluster.

 M = csvread ('C:\Users\Riyadh\Desktop\ThumbTouch All\HeatMaps\Tap 3 Rig R.csv', 1, 0, [1 0 870-1 0 ] )
 [n m] = size(M);
 C=hsv(m);
 for i=1:2:m,


 idx = find ((M(:,i)>0) & (M(:,i+1)>0));

 % evaluate center
 xc=mean(M(idx,i));
 yc=mean(M(idx,i+1));
 % evaluate standard deviation
 xs=std(M(idx,i));
 ys=std(M(idx,i+1));  
 hold on;
 %draw
 rectangle('Position',[xc-3*xs,yc-3*ys,6*xs,6*ys],'Curvature',[1,1], 'EdgeColor', C(i,:), 'FaceColor', C(i,:));
 rectangle('Position',[xc-2*xs,yc-2*ys,4*xs,4*ys],'Curvature',[1,1], 'EdgeColor', C(i,:)*0.9, 'FaceColor', C(i,:)*0.8);

 %plot points
 plot(M(idx,i), M(idx,i+1),'.', 'Color', C(i,:)*0.7); 
 end

the code

  1. import the data on M,
  2. then it fetches colums 2 x 2 (x=M(:,i), y=M(:,i+1))
  3. calculate center and standard deviation,

finally it draws two circle and points for each cluster.

result

Upvotes: 1

Related Questions