Reputation: 663
I have tab delim text files with two columns but different row length (i.e 2022,1765,834 etc.). An excerpt of the file is given below
ProbeID A.Signal ProbeID B.Sigal ProbeID C.Signal ProbeID D.Signal
13567 163.452 41235 145.678 34562 145.225 12456 143.215
3452 175.345 42563 231.678 52136 167.322 67842 456.178
1358 189.321 31256 193.564 15678 189.356 35134 167.324
46345 234.567 25672 456.124 14578 456.234 18764 234.125
65623 156.234 96432 125.678 7821 145.678
86512 178.321 45677 896.234
45677 143.896
Now I want to find those ProbeIDs from all files which has simliar Signal values and create a heatmap out of it. Please do help me.I can also provide any extra data if required.
Upvotes: 0
Views: 880
Reputation: 19454
The subset of the data you provided does not include any recurring ProbeIDs. However, if the real data does, this answer might be of interest.
If you want to merge the data in the text files by ProbeID, based on the Q&A I referenced in the comment (thanks @GGrothendieck):
df1<-data.frame(ProbeID=c(13567,3452,1358,46345,65623,86512),
A.Signal=c(163.452,175.345,189.321,234.567,156.234,178.321))
df2<-data.frame(ProbeID=c(41235,42563,31256,25672),
B.Signal=c(145.678,231.678,193.564,456.124))
df3<-data.frame(ProbeID=c(34562,52136,15678,14578,96432,45677,45677),
C.Signal=c(145.225,167.322,189.356,456.234,125.678,896.234,143.896))
df4<-data.frame(ProbeID=c(12456,67842,35134,18764,7821),
D.Signal=c(143.215,456.178,167.324,234.125,145.678))
run.seq <- function(x) as.numeric(ave(paste(x), x, FUN = seq_along))
L <- list(df1, df2, df3, df4)
L2 <- lapply(L, function(x) cbind(x, run.seq = run.seq(x$ProbeID)))
out <- Reduce(function(...) merge(..., all = TRUE), L2)[-2]
The object out
will then be a data.frame
that you can analyze, for example, by finding the mean of the signals for each Probe.
out$theRowMean<-rowMeans(out[,grep("Signal",names(out))],na.rm=TRUE)
theProbeMeans<-tapply(out$theRowMean,out$ProbeID,mean)
Upvotes: 0
Reputation: 71
What you could do is to create a file with three columns:
Probe.ID | Signal | Type 13567 | 163.452 | A 41235 | 145.678 | B ...
Then you have at least the separated files in one format. With this you can choose one of many cluster methodologies that have been used in data expression analysis. In R you can find built in clustering function (e.g. clust, kmeans).
My advice is to find a few clustering algorithms in R and try it out on your data. Plot for each clustering algorithm a heatmap and compare them. But most importantly understand how each clustering algorithm works.
Upvotes: 1