Dinesh
Dinesh

Reputation: 663

grouping Similar values in R

I have tab delim text files with two columns but different row length (i.e 2022,1765,834 etc.). An excerpt of the file is given below

  ProbeID      A.Signal ProbeID   B.Sigal   ProbeID C.Signal  ProbeID   D.Signal
    13567      163.452    41235   145.678   34562   145.225   12456   143.215
    3452       175.345    42563   231.678   52136   167.322   67842   456.178 
    1358       189.321    31256   193.564   15678   189.356   35134   167.324
    46345      234.567    25672   456.124   14578   456.234   18764   234.125
    65623      156.234                      96432   125.678   7821    145.678
    86512      178.321                      45677   896.234                  
                                            45677   143.896    

Now I want to find those ProbeIDs from all files which has simliar Signal values and create a heatmap out of it. Please do help me.I can also provide any extra data if required.

Upvotes: 0

Views: 880

Answers (2)

BenBarnes
BenBarnes

Reputation: 19454

The subset of the data you provided does not include any recurring ProbeIDs. However, if the real data does, this answer might be of interest.

If you want to merge the data in the text files by ProbeID, based on the Q&A I referenced in the comment (thanks @GGrothendieck):

df1<-data.frame(ProbeID=c(13567,3452,1358,46345,65623,86512),
  A.Signal=c(163.452,175.345,189.321,234.567,156.234,178.321))

df2<-data.frame(ProbeID=c(41235,42563,31256,25672),
  B.Signal=c(145.678,231.678,193.564,456.124))

df3<-data.frame(ProbeID=c(34562,52136,15678,14578,96432,45677,45677),
  C.Signal=c(145.225,167.322,189.356,456.234,125.678,896.234,143.896))

df4<-data.frame(ProbeID=c(12456,67842,35134,18764,7821),
  D.Signal=c(143.215,456.178,167.324,234.125,145.678))

run.seq <- function(x) as.numeric(ave(paste(x), x, FUN = seq_along))

L <- list(df1, df2, df3, df4)
L2 <- lapply(L, function(x) cbind(x, run.seq = run.seq(x$ProbeID)))

out <- Reduce(function(...) merge(..., all = TRUE), L2)[-2]

The object out will then be a data.frame that you can analyze, for example, by finding the mean of the signals for each Probe.

out$theRowMean<-rowMeans(out[,grep("Signal",names(out))],na.rm=TRUE)

theProbeMeans<-tapply(out$theRowMean,out$ProbeID,mean)

Upvotes: 0

Rui
Rui

Reputation: 71

What you could do is to create a file with three columns:

Probe.ID | Signal | Type
13567 | 163.452 | A
41235 |  145.678 | B
...

Then you have at least the separated files in one format. With this you can choose one of many cluster methodologies that have been used in data expression analysis. In R you can find built in clustering function (e.g. clust, kmeans).

My advice is to find a few clustering algorithms in R and try it out on your data. Plot for each clustering algorithm a heatmap and compare them. But most importantly understand how each clustering algorithm works.

Upvotes: 1

Related Questions