Chirayu Chamoli
Chirayu Chamoli

Reputation: 2076

how to replace a single/double character in a string

I want to replace all the single character in my string with a blank. My idea is that there should be a space before and after the single character. So i have put spaces before and after the character but that doesn't seem to work. I also wanted to replace string with more than 1 char. i.e if i want to replace all char with length 2 or so, then how would the code change.

str="I have a cat of white color"
str=gsub("([[:space:]][[a-z]][[:space:]])", "", str)

Upvotes: 2

Views: 2121

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626835

I want to replace all the single character in my string with a blank. My idea is that there should be a space before and after the single character.

The idea is not correct, a word is not always surrounded with spaces. What if the words is at the beginning of the string? Or at the end? Or is followed with a punctuation?

Use \b word boundary:

There are three different positions that qualify as word boundaries:
- Before the first character in the string, if the first character is a word character.
- After the last character in the string, if the last character is a word character.
- Between two characters in the string, where one is a word character and the other is not a word character.

NOTE that in R, when you use gsub, it is best to use it with the PCRE regex (pass perl=T):

POSIX 1003.2 mode of gsub and gregexpr does not work correctly with repeated word-boundaries (e.g., pattern = "\b"). Use perl = TRUE for such matches (but that may not work as expected with non-ASCII inputs, as the meaning of ‘word’ is system-dependent).

So, to match all 1-letter words, you need to use

gsub("(?i)\\b[a-z]\\b", "REPLACEMENT", input, perl=T) ## To replace 1 ASCII letter words

Note that (?i) is a case-insensitive modifier (making a match both a and A).

Now, you need to match 2 letter words:

gsub("(?i)\\b[a-z]{2}\\b", "REPLACEMENT", input, perl=T) ## To replace 2 ASCII letter words

Here, we are using a limiting quantifier {min, max} / {max} to specify how many times the pattern quantified with this construct can be repeated.

See IDEONE demo:

> input = "I am a football fan"
> gsub("(?i)\\b[a-z]\\b", "REPLACEMENT", input, perl=T) ## To replace 1 ASCII letter words
[1] "REPLACEMENT am REPLACEMENT football fan"
gsub("(?i)\\b[a-z]{2}\\b", "REPLACEMENT", input, perl=T) ## To replace 2 ASCII letter words
[1] "I REPLACEMENT a football fan"

Upvotes: 2

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521249

You need to use the quantifier regex property, e.g. [a-z]{2} which matches the letters a to z twice together. The regex pattern you want is something along the lines of this:

\\s[a-z]{2}\\s

You can build this regex dynamically in R using an input number of characters. Here is a code snippet which demonstrates this:

str <- "I have a cat of white color"
nchars <- 2
exp <- paste0("\\s[a-z]{", nchars, "}\\s")

> gsub(exp, "", str)
[1] "I have a catwhite color"

Upvotes: 1

Related Questions