Reputation: 93
I trying to extract a substring by pattern using gsub() R function.
# Example: extracting "7 years" substring.
string <- "Psychologist - 7 years on the website, online"
gsub(pattern="[0-9]+\\s+\\w+", replacement="", string)`
`[1] "Psychologist - on the website, online"
As you can see, it's easy to exlude needed substring using gsub(), but I need to inverse the result and getting "7 years" only. I think about using "^", something like that:
gsub(pattern="[^[0-9]+\\s+\\w+]", replacement="", string)
Please, could anyone help me with correct regexp pattern?
Upvotes: 6
Views: 1605
Reputation: 43179
You could use the opposite of \d
, which is \D
in R
:
string <- "Psychologist - 7 years on the website, online"
sub(pattern = "\\D*(\\d+\\s+\\w+).*", replacement = "\\1", string)
# [1] "7 years"
\D*
means: no digits as long as possible, the rest is captured in a group and then replaces the complete string.
Upvotes: 4
Reputation: 627219
You may use
sub(pattern=".*?([0-9]+\\s+\\w+).*", replacement="\\1", string)
See this R demo.
Details
.*?
- any 0+ chars, as few as possible([0-9]+\\s+\\w+)
- Capturing group 1:
[0-9]+
- one or more digits\\s+
- 1 or more whitespaces\\w+
- 1 or more word chars.*
- the rest of the string (any 0+ chars, as many as possible)The \1
in the replacement replaces with the contents of Group 1.
Upvotes: 7