Rubén Coca
Rubén Coca

Reputation: 61

regex to find between nth to nth occurrence

I would like to capture the characters between the 1st and 2nd occurrence of '_' in this string:

C2_Sperd20A_XXX_20170301_20170331

That is:

Sperd20A

Thank you

Upvotes: 0

Views: 2042

Answers (2)

akrun
akrun

Reputation: 887118

We can use sub to match zero or more characters that are not a _ ([^_]*) from the start (^) of the string followed by a _ followed by one or more characters that are not a _ (([^_]+)) capture it as group ((...)) followed by _ and other characters, replace with the backreference (\\1) of the captured group

sub("^[^_]*_([^_]+)_.*", "\\1", str1)
#[1] "Sperd20A"

Or between the 2nd and 3rd _

sub("^([^_]*_){2}([^_]+).*", "\\2", str1)
#[1] "XXX"

Or another option is strsplit

strsplit(str1, "_")[[1]][2]
#[1] "Sperd20A"

If it is between 2nd and 3rd _

strsplit(str1, "_")[[1]][3]
#[1] "XXX"

###data

str1 <- "C2_Sperd20A_XXX_20170301_20170331"

Upvotes: 8

Samuel
Samuel

Reputation: 3053

A good option is to use the stringr package:

library(stringr)
s <- "C2_Sperd20A_XXX_20170301_20170331"

# (?<=foo) Lookbehind
# (?=foo) Lookahead
str_extract(string = s, pattern = "(?<=_)(.*?)(?=_)")
[1] "Sperd20A"

Upvotes: 1

Related Questions