chippycentra
chippycentra

Reputation: 3432

do a column-1 for specific ending rows in a column in R

Hello i have a df such as

scaffolds start end 
scaf1.1_0 1     40
scaf1.1_2 41    78
scaf1.1_3 79    300 
seq_1f2.1 1     30
seq_1f3.1 1     90
seq_1f2.3 91    200

and I would like to do a df$end-1 only for the last df$scaffolds duplicated element (those with a _number).

here I should get the output :

scaffolds start end 
scaf1.1_0 1     40
scaf1.1_2 41    78
scaf1.1_3 79    299 
seq_1f2.1 1     30
seq_1f3.1 1     90
seq_1f2.3 91    199

where 300-1 = 299 because scaf1.1_3 was the last one and 200-1 = 199 because seq_1f2.3was the last one

as you can see seq_1f2.1 did not change because it did not have any _value

Upvotes: 0

Views: 33

Answers (1)

Duck
Duck

Reputation: 39613

Maybe is this what you are looking for? Create grouping variables based on strings and extract the last value per group to compute the difference:

library(tidyverse)
#Code
df %>% mutate(Var1=substr(scaffolds,1,3),
              Var2=as.numeric(substr(scaffolds,nchar(scaffolds),nchar(scaffolds)))) %>%
  group_by(Var1) %>%
  mutate(end=ifelse(Var2==max(Var2),end-1,end)) %>% ungroup() %>%
  select(-c(Var1,Var2))

Output:

# A tibble: 6 x 3
  scaffolds start   end
  <chr>     <int> <dbl>
1 scaf1.1_0     1    40
2 scaf1.1_2    41    78
3 scaf1.1_3    79   299
4 seq_1f2.1     1    30
5 seq_1f3.1     1    90
6 seq_1f2.3    91   199

Some data used:

#Data
df <- structure(list(scaffolds = c("scaf1.1_0", "scaf1.1_2", "scaf1.1_3", 
"seq_1f2.1", "seq_1f3.1", "seq_1f2.3"), start = c(1L, 41L, 79L, 
1L, 1L, 91L), end = c(40L, 78L, 300L, 30L, 90L, 200L)), class = "data.frame", row.names = c(NA, 
-6L))

Upvotes: 1

Related Questions