Colin
Colin

Reputation: 303

gsub only part of pattern

I want to use gsub to correct some names that are in my data. I want names such as "R. J." and "A. J." to have no space between the letters.

For example:

x <- "A. J. Burnett"

I want to use gsub to match the pattern of his first name, and then remove the space:

gsub("[A-Z]\\.\\s[A-Z]\\.", "[A-Z]\\.[A-Z]\\.", x)

But I get:

[1] "[A-Z].[A-Z]. Burnett"

Obviously, instead of the [A-Z]'s I want the actual letters in the original name. How can I do this?

Upvotes: 11

Views: 10682

Answers (2)

Jota
Jota

Reputation: 17611

You can use a look-ahead ((?=\\w\\.)) and a look-behind ((?<=\\b\\w\\.)) to target such spaces and replace them with "".

x <- c("A. J. Burnett", "Dr. R. J. Regex")
gsub("(?<=\\b\\w\\.) (?=\\w\\.)", "", x, perl = TRUE)
# [1] "A.J. Burnett"   "Dr. R.J. Regex"

The look-ahead matches a word character (\\w) followed by a period (\\.), and the look-behind matches a word-boundary (\\b) followed by a word character and a period.

Upvotes: 2

janos
janos

Reputation: 124646

Use capture groups by enclosing patterns in (...), and refer to the captured patterns with \\1, \\2, and so on. In this example:

x <- "A. J. Burnett"
gsub("([A-Z])\\.\\s([A-Z])\\.", "\\1.\\2.", x)
[1] "A.J. Burnett"

Also note that in the replacement you don't need to escape the . characters, as they don't have a special meaning there.

Upvotes: 18

Related Questions