Reputation: 34703
This is pretty basic, but I can't seem to find how to return the already-matched expression in regexes in R.
For example, suppose I wanted to add a period after an initial, for example in changing "Joe J Smith"
to "Joe J. Smith"
.
My approach is to use gsub("(?<=\\s|^)[A-Z](?=\\S|$)","\\1.",string,perl=T)
. (I'm no expert on regex, but I thought \\1
or $1
would return the matched expression, i.e. "J"
for the string given.
For nought, though, as this returns: "Joe . Smith"
I'm sure this is simple, but I can't find any examples trying to do something similar in R, which has its own brand of base regex.
Upvotes: 1
Views: 598
Reputation: 9687
Like akrun indicated, you need to parenthetise the capital letter to form a group. This is what ?regex
says:
The backreference '\N', where 'N = 1 ... 9', matches the substring
previously matched by the Nth parenthesized subexpression of the
regular expression. (This is an extension for extended regular
expressions: POSIX defines them only for basic ones.)
Adding the parens gives this example:
R>x
[1] "joe J smith"
R>gsub("(?<=\\s|^)([A-Z])(?=\\s|$)","\\1.",x,perl=TRUE)
[1] "joe J. smith"
Upvotes: 2
Reputation: 269431
In this case you can use "\\b"
to refer to word boundaries:
> gsub("\\b([A-Z])\\b", "\\1.", "Joe J Smith")
[1] "Joe J. Smith"
Regarding capitalizing the letter after a hyphen:
> gsub("(-.)", "\\U\\1", "Joe Jones-smith", perl = TRUE)
[1] "Joe Jones-Smith"
Upvotes: 5