How to extract a substring by inverse pattern with R?

Question

I trying to extract a substring by pattern using gsub() R function.

# Example: extracting "7 years" substring.
string <- "Psychologist - 7 years on the website, online"
gsub(pattern="[0-9]+\s+\w+", replacement="", string)`

`[1] "Psychologist -  on the website, online"

As you can see, it's easy to exlude needed substring using gsub(), but I need to inverse the result and getting "7 years" only. I think about using "^", something like that:

gsub(pattern="[^[0-9]+\s+\w+]", replacement="", string)

Please, could anyone help me with correct regexp pattern?

Wiktor Stribiżew · Accepted Answer

You may use

sub(pattern=".*?([0-9]+\s+\w+).*", replacement="\1", string)

See this R demo.

Details

.*? - any 0+ chars, as few as possible
([0-9]+\s+\w+) - Capturing group 1:
- [0-9]+ - one or more digits
- \s+ - 1 or more whitespaces
- \w+ - 1 or more word chars
.* - the rest of the string (any 0+ chars, as many as possible)

The \1 in the replacement replaces with the contents of Group 1.

How to extract a substring by inverse pattern with R?

Answers (2)

Related Questions