Store values from for loop in R

Question

I have a large dataframe with scaffold annotations (example rows):

 gff <- data.frame(seqid = c("Scaffold21", "Scaffold21", "Scaffold21", "Scaffold31", "Scaffold31", "Scaffold11561", "Scaffold11561"),
                    start = c(4179,16947,18411,25986,45575, 52,54100),
                    end = c(4697,17667,19643,32223,46657,1572,54627),
                    attributes = c("tRNA","sRNA","exon","rRNA","mRNA","mRNA","exon"))

And I have another dataframe with RNA coordinates (Example rows)

RNA <- data.frame(seqid = c("Scaffold21", "Scaffold11561"),
                  start = c(17047,1380))

I've been trying to filter the first dataframe to annotate the RNAs in the second one using:

scaffold <- unique(RNA$seqid)
coord <- RNA$start
n <- length(scaffold)*length(coord)
output <- matrix(ncol = ncol(gff), nrow = n)
myfunc <- function(x,y){gff[gff$seqid == x & gff$start <= y & gff$end >= y,]}

for (x in scaffold) {
  for (y in coord) {
    test = myfunc(x, y)
    output <- test
  }
}

The problem here is that only the information about the last x,y pair is being stored. I'd really appreciate if someone could help me to fix this.

The output that I'm getting now looks like: |seqid|start|end| |:----|:----|:--| |Scaffold11561|52|1572|mRNA|

Ideally, it would look like:

seqid	start	end
Scaffold21	16947	17667
Scaffold11561	52	1572

mmn · Accepted Answer

given your sample code you could use something like:

scaffold <- unique(RNA$seqid)
coord <- RNA$start
n <- length(scaffold)*length(coord)
output <- data.frame(matrix(ncol = ncol(gff), nrow = n)) #matrix can only store one type
myfunc <- function(x,y){gff[gff$seqid == x & gff$start <= y & gff$end >= y,]}

i <- 0L


for (x in scaffold) {
  for (y in coord) {
    i <- i + 1L
    test <- myfunc(x, y)
    if(nrow(test) != 1) next
    output[i, ] <- test
  }
}
output <- na.omit(output)

This is probably slow if have a lot of rows. You could also think about using joins. For example:

a<- merge(gff, RNA, by = "seqid")
a[(a$start.x <= a$start.y) & (a$end >= a$start.y),]

Store values from for loop in R

Answers (1)

Related Questions