Maria Cecilia
Maria Cecilia

Reputation: 3

Store values from for loop in R

I have a large dataframe with scaffold annotations (example rows):

 gff <- data.frame(seqid = c("Scaffold21", "Scaffold21", "Scaffold21", "Scaffold31", "Scaffold31", "Scaffold11561", "Scaffold11561"),
                    start = c(4179,16947,18411,25986,45575, 52,54100),
                    end = c(4697,17667,19643,32223,46657,1572,54627),
                    attributes = c("tRNA","sRNA","exon","rRNA","mRNA","mRNA","exon"))

And I have another dataframe with RNA coordinates (Example rows)

RNA <- data.frame(seqid = c("Scaffold21", "Scaffold11561"),
                  start = c(17047,1380))

I've been trying to filter the first dataframe to annotate the RNAs in the second one using:

scaffold <- unique(RNA$seqid)
coord <- RNA$start
n <- length(scaffold)*length(coord)
output <- matrix(ncol = ncol(gff), nrow = n)
myfunc <- function(x,y){gff[gff$seqid == x & gff$start <= y & gff$end >= y,]}

for (x in scaffold) {
  for (y in coord) {
    test = myfunc(x, y)
    output <- test
  }
}

The problem here is that only the information about the last x,y pair is being stored. I'd really appreciate if someone could help me to fix this.

The output that I'm getting now looks like: |seqid|start|end| |:----|:----|:--| |Scaffold11561|52|1572|mRNA|

Ideally, it would look like:

seqid start end
Scaffold21 16947 17667
Scaffold11561 52 1572

Upvotes: 0

Views: 196

Answers (1)

mmn
mmn

Reputation: 150

given your sample code you could use something like:

scaffold <- unique(RNA$seqid)
coord <- RNA$start
n <- length(scaffold)*length(coord)
output <- data.frame(matrix(ncol = ncol(gff), nrow = n)) #matrix can only store one type
myfunc <- function(x,y){gff[gff$seqid == x & gff$start <= y & gff$end >= y,]}

i <- 0L


for (x in scaffold) {
  for (y in coord) {
    i <- i + 1L
    test <- myfunc(x, y)
    if(nrow(test) != 1) next
    output[i, ] <- test
  }
}
output <- na.omit(output)

This is probably slow if have a lot of rows. You could also think about using joins. For example:

a<- merge(gff, RNA, by = "seqid")
a[(a$start.x <= a$start.y) & (a$end >= a$start.y),]

Upvotes: 1

Related Questions