Reputation: 17001
How to extract only DATABASE_NAME from this string using POSIX-style regular expressions?
st <- "MICROSOFT_SQL_SERVER.DATABASE\INSTANCE.DATABASE_NAME."
First of all, this generates an error
Error: '\I' is an unrecognized escape in character string starting "MICROSOFT_SQL_SERVER.DATABASE\I"
I was thinking something like
sub(".*\\.", st, "")
Upvotes: 2
Views: 1530
Reputation: 179468
Other answers provided some really good alternative ways of cracking the problem using strsplit
or str_split
.
However, if you really want to use a regex and gsub
, this solution substitutes the first two occurrences of a (string followed by a period) with an empty string.
Note the use of the ?
modifier to tell the regex not to be greedy, as well as the {2}
modifier to tell it to repeat the expression in brackets two times.
gsub("\\.", "", gsub("(.+?\\.){2}", "", st))
[1] "DATABASE_NAME"
Upvotes: 2
Reputation: 179468
An alternative approach is to use str_split
in package stringr
. The idea is to split st into strings at each period, and then to isolate the third string:
st <- "MICROSOFT_SQL_SERVER.DATABASE\\INSTANCE.DATABASE_NAME."
library(stringr)
str_split(st, "\\.")[[1]][3]
[1] "DATABASE_NAME"
Upvotes: 1
Reputation: 174853
The first problem is that you need to escape the \
in your string:
st <- "MICROSOFT_SQL_SERVER.DATABASE\\INSTANCE.DATABASE_NAME."
As for the main problem, this will return the bit you want from the string you gave:
> sub("\\.$", "", sub("[A-Za-z0-9\\._]*\\\\[A-Za-z]*\\.", "", st))
[1] "DATABASE_NAME"
But a simpler solution would be to split on the \\.
and select the last chunk:
> strsplit(st, "\\.")[[1]][3]
[1] "DATABASE_NAME"
or slightly more automated
> sst <- strsplit(st, "\\.")[[1]]
> tail(sst, 1)
[1] "DATABASE_NAME"
Upvotes: 3