Reputation: 945
I have a vector of strings, similar to this one, but with many more elements:
s <- c("CGA-DV-558_T_90.67.0_DV_1541_07", "TC-V-576_T_90.0_DV_151_0", "TCA-DV-X_T_6.0_D_A2_07", "T-V-Z_T_2_D_A_0", "CGA-DV-AW0_T.1_24.4.0_V_A6_7", "ACGA-DV-A4W0_T_274.46.0_DV_A266_07")
And I would like to use a function that extracts the string between the nth and ith instances of the delimiter "_". For example, the string between the 2nd (n = 2) and 3rd (i = 3) instances, to get this:
[1] "90.67.0" "90.0" "6.0" "2" "24.4.0" "274.46.0"
Or if n = 4 and i = 5"
[1] "1541" "151" "A2" "A" "A" "A266"
Any suggestions?
Upvotes: 5
Views: 2384
Reputation: 38500
A third method, that uses substring
for the extraction and gregexpr
to find the positions is
# extract postions of "_" from each vector element, returns a list
spots <- gregexpr("_", s, fixed=TRUE)
# extract text in between third and fifth underscores
substring(s, sapply(spots, "[", 3) + 1, sapply(spots, "[", 5) - 1)
"DV_1541" "DV_151" "D_A2" "D_A" "V_A6" "DV_A266"
Upvotes: 4
Reputation: 32548
#FUNCTION
foo = function(x, n, i){
do.call(c, lapply(x, function(X)
paste(unlist(strsplit(X, "_"))[(n+1):(i)], collapse = "_")))
}
#USAGE
foo(x = s, n = 3, i = 5)
#[1] "DV_1541" "DV_151" "D_A2" "D_A" "V_A6" "DV_A266"
Upvotes: 5
Reputation: 37641
You can do this with gsub
n = 2
i = 3
pattern1 = paste0("(.*?_){", n, "}")
temp = gsub(pattern1, "", s)
pattern2 = paste0("((.*?_){", i-n, "}).*")
temp = gsub(pattern2, "\\1", temp)
temp = gsub("_$", "", temp)
[1] "1541" "151" "A2" "A" "A6" "A266"
Upvotes: 5