Reputation: 7
#gg is my data frame as it contain 1000 different seqname
seqname start end
scaffold02439 843 1180
scaffold02439 928 1180
scaffold02439 3560 3672
scaffold02439 3560 3672
scaffold02439 5525 5666
scaffold02439 5525 5666
#I want to take the start and end value of each seqname
c <- unique(gg$seqname)
l1 <- gg %>% filter(seqname %in% c[1] )
l1a <- l1[1,2]
l1b <- l1[nrow(l1),3]
l1 <- cbind(c[1],l1a,l1b)
l2 <- gg %>% filter(seqname %in% c[2])
l2a <- l2[1,2]
l2b <- l2[nrow(l2),3]
l2 <- cbind(c[2],l2a,l2b)
How can I write this in loop form to extract each value and then take the start (2nd column) of the first row and end (third column) of last row of this data
Upvotes: 1
Views: 44
Reputation: 887691
We can also use data.table
library(data.table)
setDT(gg)[, .(start = first(start), end = last(end)), seqname]
Upvotes: 0
Reputation: 18425
You could do something like this
library(dplyr)
gg %>% group_by(seqname) %>% #group by seqname
summarise(start = first(start), end = last(end))#extract first and last
This will return the first and last entries, which are not necessarily the minimum and maximum values of start
and end
. If you want this instead, just use summarise(start = min(start), end = max(end))
.
Upvotes: 1