How to remove string before and after certain delimiter positions in R?

Question

I have strings that look like this below

tt <- c("16S_M_T1_R1_S1_S50_R1_001.fastq.gz", "16S_M_T1_R1_S1_S50_R2_001.fastq.gz", 
"16S_M_T1_R1_S2_S62_R1_001.fastq.gz")

I want to delete everything before the 5th _ and everything after the 6th _. The result I want is: S50, S50, S62

I can do this in multiple steps by doing something like sub("^(.*?_.*?_.*?_.*?_.*?_.*?)_.*", "\1", tt), but I was wondering if there is a better one-step method to do this.

Maurits Evers · Accepted Answer

You could use strsplit

sapply(strsplit(tt, "_"), "[[", 6)
#[1] "S50" "S50" "S62"

Explanation: We use vectorised strsplit to split tt on every "_" resulting in a list; sapply(..., "[[", 6) then extracts the 6th element from every list element.

Alternatively you could use an explicit anonymous function

sapply(strsplit(tt, "_"), function(x) x[6])

How to remove string before and after certain delimiter positions in R?

Answers (2)

Related Questions