RAS
RAS

Reputation: 121

Regex to extract values between 2 underscores, including a value that is an underscore

I am working in R and and trying to extract part of a character string separated with underscores, including an underscore:

WRAP_384_p1_QC1_8
WRAP_384_p3_QC1_7   

I wish to obtain an output like this:

1_QC1
3_QC1

What regex do I need to extract this information?

Upvotes: 1

Views: 4880

Answers (1)

akrun
akrun

Reputation: 887118

We can use gsub to match one or more characters (.*) followed by a _ followed by a lower case letter ([a-z]) or | a _ followed by one or more numbers (\\d+) until the end ($) of the string and replace it with blanks ("").

gsub(".*_[a-z]|_\\d+$", "", str1)
#[1] "1_QC1" "3_QC1"

Or use sub with capture groups to match two instances of one or more not a underscore followed by a underscore (([^_]+_){2}) from the start (^) of the string followed by a lower case letter ([a-z]), and then we capture the group within the brackets ((...)) for one or more numbers (\\d+) followed by _ and one or more alpha numeric characters ([[:alnum:]]+) close the capture group bracket followed by underscore (_) and one or more numbers (\\d+). We replace it with the second capture group (\\2).

sub("^([^_]+_){2}[a-z](\\d+_[[:alnum:]]+)_\\d+", "\\2", str1)
#[1] "1_QC1" "3_QC1"

data

str1 <- c("WRAP_384_p1_QC1_8", "WRAP_384_p3_QC1_7")

Upvotes: 6

Related Questions