Reputation: 2076
I want to replace all the single character in my string with a blank. My idea is that there should be a space before and after the single character. So i have put spaces before and after the character but that doesn't seem to work. I also wanted to replace string with more than 1 char. i.e if i want to replace all char with length 2 or so, then how would the code change.
str="I have a cat of white color"
str=gsub("([[:space:]][[a-z]][[:space:]])", "", str)
Upvotes: 2
Views: 2121
Reputation: 626835
I want to replace all the single character in my string with a blank. My idea is that there should be a space before and after the single character.
The idea is not correct, a word is not always surrounded with spaces. What if the words is at the beginning of the string? Or at the end? Or is followed with a punctuation?
Use \b
word boundary:
There are three different positions that qualify as word boundaries:
- Before the first character in the string, if the first character is a word character.
- After the last character in the string, if the last character is a word character.
- Between two characters in the string, where one is a word character and the other is not a word character.
NOTE that in R, when you use gsub
, it is best to use it with the PCRE regex (pass perl=T
):
POSIX 1003.2 mode of
gsub
andgregexpr
does not work correctly with repeated word-boundaries (e.g.,pattern = "\b"
). Useperl = TRUE
for such matches (but that may not work as expected with non-ASCII inputs, as the meaning of ‘word’ is system-dependent).
So, to match all 1-letter words, you need to use
gsub("(?i)\\b[a-z]\\b", "REPLACEMENT", input, perl=T) ## To replace 1 ASCII letter words
Note that (?i)
is a case-insensitive modifier (making a
match both a
and A
).
Now, you need to match 2 letter words:
gsub("(?i)\\b[a-z]{2}\\b", "REPLACEMENT", input, perl=T) ## To replace 2 ASCII letter words
Here, we are using a limiting quantifier {min, max}
/ {max}
to specify how many times the pattern quantified with this construct can be repeated.
See IDEONE demo:
> input = "I am a football fan"
> gsub("(?i)\\b[a-z]\\b", "REPLACEMENT", input, perl=T) ## To replace 1 ASCII letter words
[1] "REPLACEMENT am REPLACEMENT football fan"
gsub("(?i)\\b[a-z]{2}\\b", "REPLACEMENT", input, perl=T) ## To replace 2 ASCII letter words
[1] "I REPLACEMENT a football fan"
Upvotes: 2
Reputation: 521249
You need to use the quantifier regex property, e.g. [a-z]{2}
which matches the letters a
to z
twice together. The regex pattern you want is something along the lines of this:
\\s[a-z]{2}\\s
You can build this regex dynamically in R using an input number of characters. Here is a code snippet which demonstrates this:
str <- "I have a cat of white color"
nchars <- 2
exp <- paste0("\\s[a-z]{", nchars, "}\\s")
> gsub(exp, "", str)
[1] "I have a catwhite color"
Upvotes: 1