saad khan
saad khan

Reputation: 33

How to generate medoid plots

Hi I am using partitioning around medoids algorithm for clustering using the pam function in clustering package. I have 4 attributes in the dataset that I clustered and they seem to give me around 6 clusters and I want to generate a a plot of these clusters across those 4 attributes like this 1: http://www.flickr.com/photos/52099123@N06/7036003411/in/photostream/lightbox/ "Centroid plot"

But the only way I can draw the clustering result is either using a dendrogram or using plot (data, col = result$clustering) command which seems to generate a plot similar to this [2] : http://www.flickr.com/photos/52099123@N06/7036003777/in/photostream "pam results".

Although the first image is a centroid plot I am wondering if there are any tools available in R to do the same with a medoid plot Note that it also prints the size of each cluster in the plot. It would be great to know if there are any packages/solutions available in R that facilitate to do this or if not what should be a good starting point in order to achieve plots similar to that in Image 1.

Thanks

Hi All,I was trying to work out the problem the way Joran told but I think I did not understand it correctly and have not done it the right way as it is supposed to be done. Anyway this is what I have done so far. Following is how the file looks like that I tried to cluster

 geneID         RPKM-base       RPKM-1cm        RPKM+4cm        RPKMtip  
GRMZM2G181227   3.412444267     3.16437442      1.287909035     0.037320722  
GRMZM2G146885   14.17287135     11.3577013      2.778514642     2.226818648  
GRMZM2G139463   6.866752401     5.373925806     1.388843962     1.062745344  
GRMZM2G015295   1349.446347     447.4635291     29.43627879     29.2643755  
GRMZM2G111909   47.95903081     27.5256729      1.656555758     0.949824883 
GRMZM2G078097   4.433627458     0.928492841     0.063329249     0.034255945  
GRMZM2G450498   36.15941083     9.45235616      0.700105077     0.194759794  
GRMZM2G413652   25.06985426     15.91342458     5.372151214     3.618914949     
GRMZM2G090087   21.00891969     18.02318412     17.49531186     10.74302155 

following is the Pam clustering output

GRMZM2G181227
1
GRMZM2G146885
2
GRMZM2G139463
2
GRMZM2G015295
2
GRMZM2G111909
2
GRMZM2G078097
3
GRMZM2G450498
3
GRMZM2G413652
2
GRMZM2G090087
2
AC217811.3_FG003
2

Using the above two files I generated a third file that somewhat looks like this and has cluster information in the form of cluster type K1,K2,etc

geneID  RPKM-base       RPKM-1cm        RPKM+4cm        RPKMtip Cluster_type
GRMZM2G181227   3.412444267     3.16437442      1.287909035     0.037320722     K1
GRMZM2G146885   14.17287135     11.3577013      2.778514642     2.226818648     K2
GRMZM2G139463   6.866752401     5.373925806     1.388843962     1.062745344     K2
GRMZM2G015295   1349.446347     447.4635291     29.43627879     29.2643755      K2
GRMZM2G111909   47.95903081     27.5256729      1.656555758     0.949824883     K2
GRMZM2G078097   4.433627458     0.928492841     0.063329249     0.034255945     K3
GRMZM2G450498   36.15941083     9.45235616      0.700105077     0.194759794     K3
GRMZM2G413652   25.06985426     15.91342458     5.372151214     3.618914949     K2
GRMZM2G090087   21.00891969     18.02318412     17.49531186     10.74302155     K2

I certainly don't think that this is the file that joran would have wanted me to create but I could not think of anything else thus I ran lattice on the above file using the following code.

clusres<- read.table("clusinput.txt",header=TRUE,sep="\t");
jpeg(filename = "clusplot.jpeg", width = 800, height = 1078,
     pointsize = 12, quality = 100, bg = "white",res=100);
     parallel(~clusres[2:5]|Cluster_type,clusres,horizontal.axis=FALSE);
dev.off();

and I get a picture like this parallel plot of the cluster

Since I want one single line as the representative of the whole cluster at four different points this output is wrong moreover I tried playing with lattice but I can not figure out how to make it accept the Rpkm values as the X coordinate It always seems to plot so many lines against a maximum or minimum value at the Y coordinate which I don't understand what it is.

It will be great if anybody can help me out. Sorry If my question still seems absurd to you.

Upvotes: 2

Views: 5657

Answers (2)

Geek On Acid
Geek On Acid

Reputation: 6410

How about using clusplot from package cluster with partitioning around medoids? Here is a simple example (from the example section):

require(cluster)
#generate 25 objects, divided into 2 clusters.
x <- rbind(cbind(rnorm(10,0,0.5), rnorm(10,0,0.5)),
     cbind(rnorm(15,5,0.5), rnorm(15,5,0.5))) 
clusplot(pam(x, 2)) #`pam` does you partitioning

enter image description here

Upvotes: 1

joran
joran

Reputation: 173677

I do not know of any pre-built functions that generate the plot you indicate, which looks to me like a sort of parallel coordinates plot.

But generating such a plot would be a fairly trivial exercise.

  1. Add a column of cluster labels (K1,K2, etc.) to your original data set, based on your clustering algorithm's output.

  2. Use one of the many, many tools in R for aggregating data (plyr, aggregate, etc.) to calculate the relevant summary statistics by cluster on each of the four variables. (You haven't said what the first graph is actually plotting. Mean and sd? Median and MAD?)

  3. Since you want the plots split into six separate panels, or facets, you will probably want to plot the data using either ggplot or lattice, both of which provide excellent support for creating the same plot, split across a single grouping vector (i.e. the clusters in your case).

But that's about as specific as anyone can get, given that you've provided so little information (i.e. no minimal runnable example, as recommended here).

Upvotes: 4

Related Questions