Reputation: 2385
I have a data frame with three strings, mature
, star
and precursor
.
The columns mature
and star
is a substring of precursor
. I would like to add a new column in the data frame that says if the mature string is matching the left or the right part of the precursor string.
In my example, the first row is matching the left part of its precursor and the second row is matching the right part of its precursor. Left and right should be defined as from the middle of the precursor string. However, the substring is not always at the very beginning or the very end of the precursor, somethings it starts as position 2 or 3.
Is there a way of doing this using stringr, or any other R package?
df <- structure(list(mature = c("uggagugugacaaugguguuu", "cuauacaacuuacugucuuucc"
), star = c("aacgccauuaucacacuaaau", "ugagguaguagguuguauag"
), precursor = c("uggagugugacaaugguguuuguguccuccguaucaaacgccauuaucacacuaaau",
"ugagguaguagguuguauaguuuuagggucauucccaagcugucagaugacuauacaacuuacugucuuucc"
)), row.names = 1:2, class = "data.frame")
I looked at str_locate_all which gives me the position of the mature relative to the precursor.
> str_locate_all(pattern =df$mature, df$precursor)
[[1]]
start end
[1,] 1 21
[[2]]
start end
[1,] 51 72
Upvotes: 0
Views: 118
Reputation: 388807
You can use str_locate
:
library(stringr)
mat <- str_locate(df$precursor, df$mature)
ifelse(nchar(df$precursor)/2 > mat[, 1], 'left', 'right')
#[1] "left" "right"
This compares the starting position of the string with half of the length of the string and assign 'left'
or 'right'
to it.
Upvotes: 1
Reputation: 21757
This should do it. Instead of using str_locate()
, you could use str_detect()
and use the beginning and end of string regex characters.
df %>%
mutate(mature_side = case_when(
str_detect(precursor, paste0("^", mature)) ~ "Left",
str_detect(precursor, paste0(mature, "$")) ~ "Right",
TRUE ~ "Neither"
))
mature star precursor
# 1 uggagugugacaaugguguuu aacgccauuaucacacuaaau uggagugugacaaugguguuuguguccuccguaucaaacgccauuaucacacuaaau
# 2 cuauacaacuuacugucuuucc ugagguaguagguuguauag ugagguaguagguuguauaguuuuagggucauucccaagcugucagaugacuauacaacuuacugucuuucc
# mature_side
# 1 Left
# 2 Right
Upvotes: 4