user2300940
user2300940

Reputation: 2385

Define if substring is matching left or right part of original string

I have a data frame with three strings, mature, star and precursor. The columns mature and star is a substring of precursor . I would like to add a new column in the data frame that says if the mature string is matching the left or the right part of the precursor string. In my example, the first row is matching the left part of its precursor and the second row is matching the right part of its precursor. Left and right should be defined as from the middle of the precursor string. However, the substring is not always at the very beginning or the very end of the precursor, somethings it starts as position 2 or 3.

Is there a way of doing this using stringr, or any other R package?

df <-     structure(list(mature = c("uggagugugacaaugguguuu", "cuauacaacuuacugucuuucc"
), star = c("aacgccauuaucacacuaaau", "ugagguaguagguuguauag"
), precursor = c("uggagugugacaaugguguuuguguccuccguaucaaacgccauuaucacacuaaau", 
"ugagguaguagguuguauaguuuuagggucauucccaagcugucagaugacuauacaacuuacugucuuucc"
)), row.names = 1:2, class = "data.frame")

I looked at str_locate_all which gives me the position of the mature relative to the precursor.

> str_locate_all(pattern =df$mature, df$precursor)
[[1]]
     start end
[1,]     1  21

[[2]]
     start end
[1,]    51  72

Upvotes: 0

Views: 118

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 388807

You can use str_locate :

library(stringr)
mat <- str_locate(df$precursor, df$mature)
ifelse(nchar(df$precursor)/2 > mat[, 1], 'left', 'right')
#[1] "left"  "right"

This compares the starting position of the string with half of the length of the string and assign 'left' or 'right' to it.

Upvotes: 1

DaveArmstrong
DaveArmstrong

Reputation: 21757

This should do it. Instead of using str_locate(), you could use str_detect() and use the beginning and end of string regex characters.

df %>% 
  mutate(mature_side = case_when(
    str_detect(precursor, paste0("^", mature)) ~ "Left", 
    str_detect(precursor, paste0(mature, "$")) ~ "Right", 
    TRUE ~ "Neither"
    
  ))

                  mature                  star                                                                precursor
# 1  uggagugugacaaugguguuu aacgccauuaucacacuaaau                uggagugugacaaugguguuuguguccuccguaucaaacgccauuaucacacuaaau
# 2 cuauacaacuuacugucuuucc  ugagguaguagguuguauag ugagguaguagguuguauaguuuuagggucauucccaagcugucagaugacuauacaacuuacugucuuucc
#   mature_side
# 1        Left
# 2       Right

Upvotes: 4

Related Questions