Reputation: 411
Hi I got a list like that
$`2`
chr.pos nt.pos CNV GRP
1 783605 1 2
1 888149 1 2
1 991311 1 2
1 1089305 1 2
1 1177669 1 2
$`4`
chr.pos nt.pos CNV GRP
2 1670488 1 4
2 1758800 1 4
$`6`
chr.pos nt.pos CNV GRP
2 1902924 1 6
2 1978088 1 6
and I want to extract for each element, the unique chromosome, the CNV and the group and the highest and lowest nt.pos, so the output would be, I prefer a data frame
chr.pos Start End GRP
1 783605 1177669 2
2 1670488 175880 4
2 1902924 1978088 6
I tried with this
results<-lapply(mylist, function(x){
return(as.data.frame(unique(x$chr.pos),range(x$nt.pos)[1],range(x$nt.pos) [2],unique(x$GRP)))
}
)
But of course, what I got is a list.
Could you help me please?
Upvotes: 0
Views: 180
Reputation: 81693
This does the trick: (Assuming dat
is your list of data frames.)
structure(
as.data.frame(cbind(do.call(rbind,
lapply(dat,
function(x) c(x[["chr.pos"]][1],
range(x[["nt.pos"]])))),
as.numeric(names(dat)))),
.Names = c("chr.pos", "Start", "End", "GRP"))
# chr.pos Start End GRP
# 2 1 783605 1177669 2
# 4 2 1670488 1758800 4
# 6 2 1902924 1978088 6
Upvotes: 1
Reputation: 193527
Assuming that your list is named "dat", as below:
dat <- read.table(header = TRUE, text = "chr.pos nt.pos CNV GRP
1 783605 1 2
1 888149 1 2
1 991311 1 2
1 1089305 1 2
1 1177669 1 2
2 1670488 1 4
2 1758800 1 4
2 1902924 1 6
2 1978088 1 6")
dat <- split(dat, dat$GRP)
First, a question: do you really need it as a list
, or can it just be a long data.frame
? If it has to remain a list, perhaps try the following:
sapply()
data.frame(t(sapply(dat, function(x)
data.frame(chr.pos = unique(x["chr.pos"]),
Start = min(x["nt.pos"]),
End = max(x["nt.pos"]),
GRP = unique(x["GRP"])))))
lapply()
do.call(rbind, lapply(dat, function(x)
data.frame(chr.pos = unique(x["chr.pos"]),
Start = min(x["nt.pos"]),
End = max(x["nt.pos"]),
GRP = unique(x["GRP"]))))
Both will result in:
# chr.pos Start End GRP
# 2 1 783605 1177669 2
# 4 2 1670488 1758800 4
# 6 2 1902924 1978088 6
Second, if it can be a long data.frame
, then explore data.table()
:
library(data.table)
DaT <- data.table(do.call(rbind, dat), key = "GRP")
DaT[, list(chr.pos = unique(chr.pos),
Start = min(nt.pos),
End = max(nt.pos)), by = key(DaT)]
# GRP chr.pos Start End
# 1: 2 1 783605 1177669
# 2: 4 2 1670488 1758800
# 3: 6 2 1902924 1978088
Upvotes: 3
Reputation: 411
Thanks Sven,
I did it in a similar way using this
N <- length(results) #
DF <- data.frame(chr=rep(NA, N), Start=rep(NA, N), End=rep(NA,N), Group=rep(NA,N), stringsAsFactors=FALSE)
for (i in 1:length(results)){
DF[i, ] <- c(unique(results[[i]]$chr.pos), range(results[[i]]$nt.pos)[1], range(results[[i]]$nt.pos)[2],unique(results[[i]]$GRP))
}
Upvotes: 0