shaimaa Hassan
shaimaa Hassan

Reputation: 7

how can I write loop function in R to repeat these codes for each row

#gg is my data frame as it contain 1000 different seqname

seqname       start  end
 scaffold02439   843 1180
 scaffold02439   928 1180
 scaffold02439  3560 3672
 scaffold02439  3560 3672
 scaffold02439  5525 5666
 scaffold02439  5525 5666

#I want to take the start and end value of each seqname

c <- unique(gg$seqname)
l1 <- gg %>% filter(seqname %in% c[1] )
l1a <- l1[1,2]
l1b <- l1[nrow(l1),3]
l1 <- cbind(c[1],l1a,l1b)
l2 <- gg %>% filter(seqname %in% c[2])
l2a <- l2[1,2]
l2b <- l2[nrow(l2),3]
l2 <- cbind(c[2],l2a,l2b)

How can I write this in loop form to extract each value and then take the start (2nd column) of the first row and end (third column) of last row of this data

Upvotes: 1

Views: 44

Answers (2)

akrun
akrun

Reputation: 887691

We can also use data.table

library(data.table)
setDT(gg)[, .(start = first(start), end = last(end)), seqname]

Upvotes: 0

Andrew Gustar
Andrew Gustar

Reputation: 18425

You could do something like this

library(dplyr)

gg %>% group_by(seqname) %>%                       #group by seqname
   summarise(start = first(start), end = last(end))#extract first and last

This will return the first and last entries, which are not necessarily the minimum and maximum values of start and end. If you want this instead, just use summarise(start = min(start), end = max(end)).

Upvotes: 1

Related Questions