arielle
arielle

Reputation: 945

Extract string between nth and ith instance of delimiter

I have a vector of strings, similar to this one, but with many more elements:

s <- c("CGA-DV-558_T_90.67.0_DV_1541_07", "TC-V-576_T_90.0_DV_151_0", "TCA-DV-X_T_6.0_D_A2_07", "T-V-Z_T_2_D_A_0", "CGA-DV-AW0_T.1_24.4.0_V_A6_7", "ACGA-DV-A4W0_T_274.46.0_DV_A266_07")

And I would like to use a function that extracts the string between the nth and ith instances of the delimiter "_". For example, the string between the 2nd (n = 2) and 3rd (i = 3) instances, to get this:

[1] "90.67.0"  "90.0"     "6.0"      "2"        "24.4.0"   "274.46.0"

Or if n = 4 and i = 5"

[1] "1541" "151"  "A2"   "A"    "A"    "A266"

Any suggestions?

Upvotes: 5

Views: 2384

Answers (3)

lmo
lmo

Reputation: 38500

A third method, that uses substring for the extraction and gregexpr to find the positions is

# extract postions of "_" from each vector element, returns a list
spots <- gregexpr("_", s, fixed=TRUE)

# extract text in between third and fifth underscores
substring(s, sapply(spots, "[", 3) + 1, sapply(spots, "[", 5) - 1)
"DV_1541" "DV_151"  "D_A2"    "D_A"     "V_A6"    "DV_A266"

Upvotes: 4

d.b
d.b

Reputation: 32548

#FUNCTION
foo = function(x, n, i){
    do.call(c, lapply(x, function(X)
        paste(unlist(strsplit(X, "_"))[(n+1):(i)], collapse = "_")))
}

#USAGE
foo(x = s, n = 3, i = 5)
#[1] "DV_1541" "DV_151"  "D_A2"    "D_A"     "V_A6"    "DV_A266"

Upvotes: 5

G5W
G5W

Reputation: 37641

You can do this with gsub

n = 2
i = 3

pattern1 = paste0("(.*?_){", n,  "}")
temp = gsub(pattern1, "", s)
pattern2 = paste0("((.*?_){", i-n,  "}).*")
temp = gsub(pattern2, "\\1", temp)
temp = gsub("_$", "", temp)
[1] "1541" "151"  "A2"   "A"    "A6"   "A266"

Upvotes: 5

Related Questions