Reputation: 7800
I have three strings:
x <- "PB0038.1_Jundm2_1/Jaspar.instid_chr1:183286850-183287250.bin1"
y <- "Ddit3::Cebpa/MA0019.1/Jaspar.instid_chr1:183286845-183287245.bin22"
z <- "Arid3a/MA0151.1/Jaspar.instid_chr1:183286849-183287249.bin10"
The regex
^(.*?)\\/.*?\\/.*?\\.instid_(.*?)\\.bin(\\d+)
Works fine for string y
, z
but not x
.
> stringr::str_match(y,"^(.*?)\\/.*?\\/.*?\\.instid_(.*?)\\.bin(\\d+)")[,c(2,3,4)]
[1] "Ddit3::Cebpa" "chr1:183286845-183287245" "22"
> stringr::str_match(z,"^(.*?)\\/.*?\\/.*?\\.instid_(.*?)\\.bin(\\d+)")[,c(2,3,4)]
[1] "Arid3a" "chr1:183286849-183287249" "10"
> stringr::str_match(x,"^(.*?)\\/.*?\\/.*?\\.instid_(.*?)\\.bin(\\d+)")[,c(2,3,4)]
[1] NA NA NA
How can I modify it?
The desired end result for x
is
"PB0038.1_Jundm2_1", "chr1:183286850-183287250" "1"
Upvotes: 1
Views: 39
Reputation: 522161
Your x
input does not and should not match, because it only has one forward slash but your pattern expects two. If you want to allow either one or two forward slashes then one possible modification to your pattern is the following:
str_match(x, "^(.*?)\\/.*?\\.instid_(.*?)\\.bin(\\d+)")[,c(2,3,4)]
You might find the above pattern acceptable because you are only capturing what comes before the first slash. The other two captures happen after the .instid_
token and at the very end after the bin
extension. But these would all seem to not depend on the number of slashes in the path.
Upvotes: 2