Reputation: 3432
Hello i have a df such as
scaffolds start end
scaf1.1_0 1 40
scaf1.1_2 41 78
scaf1.1_3 79 300
seq_1f2.1 1 30
seq_1f3.1 1 90
seq_1f2.3 91 200
and I would like to do a df$end-1 only for the last df$scaffolds duplicated element (those with a _number).
here I should get the output :
scaffolds start end
scaf1.1_0 1 40
scaf1.1_2 41 78
scaf1.1_3 79 299
seq_1f2.1 1 30
seq_1f3.1 1 90
seq_1f2.3 91 199
where 300-1 = 299 because scaf1.1_3
was the last one
and 200-1 = 199 because seq_1f2.3
was the last one
as you can see seq_1f2.1
did not change because it did not have any _value
Upvotes: 0
Views: 33
Reputation: 39613
Maybe is this what you are looking for? Create grouping variables based on strings and extract the last value per group to compute the difference:
library(tidyverse)
#Code
df %>% mutate(Var1=substr(scaffolds,1,3),
Var2=as.numeric(substr(scaffolds,nchar(scaffolds),nchar(scaffolds)))) %>%
group_by(Var1) %>%
mutate(end=ifelse(Var2==max(Var2),end-1,end)) %>% ungroup() %>%
select(-c(Var1,Var2))
Output:
# A tibble: 6 x 3
scaffolds start end
<chr> <int> <dbl>
1 scaf1.1_0 1 40
2 scaf1.1_2 41 78
3 scaf1.1_3 79 299
4 seq_1f2.1 1 30
5 seq_1f3.1 1 90
6 seq_1f2.3 91 199
Some data used:
#Data
df <- structure(list(scaffolds = c("scaf1.1_0", "scaf1.1_2", "scaf1.1_3",
"seq_1f2.1", "seq_1f3.1", "seq_1f2.3"), start = c(1L, 41L, 79L,
1L, 1L, 91L), end = c(40L, 78L, 300L, 30L, 90L, 200L)), class = "data.frame", row.names = c(NA,
-6L))
Upvotes: 1