Reputation: 1238
In a string data column how is it possible to check in every row if exist a letter of alphabet and remove it.
Example
I am a text r r o n n r and here
And have as output this
I am a text and here
Upvotes: 2
Views: 665
Reputation: 887651
The condition is not very clear
gsub('\\b[ron] ', '', txt)
#[1] "I am a text and here"
Or using a more general approach
gsub("(?<=\\b\\K[a-z]) [a-z] ", "", txt, perl = TRUE)
#[1] "I am a text and here"
Or more simpler
gsub('\\b[a-z] [a-z] ', '', txt)
#[1] "I am a text and here"
Or with str_remove_all
library(stringr)
str_remove_all(txt, "\\b[ron] ")
#[1] "I am a text and here"
txt <- "I am a text r r o n n r and here"
Upvotes: 1
Reputation: 160677
gsub("\\s[A-Za-z](?= )", "", "I am a text r r o n n r and here", perl = TRUE)
# [1] "I am text and here"
Since you want to preserve the single a
, you can use any of the following for more-specific patterns:
### just three letters: r o n
gsub("\\s[orn](?= )", "", "I am a text r r o n n r and here", perl = TRUE)
# [1] "I am a text and here"
### any single-letter except "a" and "i"
gsub("\\s[B-HJ-Zb-hj-z](?= )", "", "I am a text r r o n n r and here", perl = TRUE)
# [1] "I am a text and here"
(The exception for i
in the second example is not strictly needed here, but provided as an example.)
The use of a "look-ahead" ((?= )
is used because you stated the requirement for a space before and after; if you use a pattern of "\\s[orn]\\s"
, then it will miss many of the single letters (due to recursive replacements). If you relax this a little, then you can use word-boundaries, as in
gsub("\\s[B-HJ-Zb-hj-z]\\b", "", "I am a text r r o n n r and here")
(While using perl-style regexes can technically have a performance penalty, I suspect that that really only matters if you're doing a lot and need to improve performance as much as you can. Not strictly required, and early optimization is "the root of all evil" - Donald Knuth)
Note: In this last pattern, I'm also excepting a lower-case i
in addition to the upper-case I
; if you are confident that you will never see an otherwise-valid i
, then you can adjust your pattern to use [B-HJ-Zb-z]
instead. (Thanks to @jay.sf for highlighting this assumption.)
Upvotes: 5
Reputation: 1261
You can do that with stringr package as follows;
library(stringr)
library(dplyr)
# Create dataframe with column
data <-
data.frame(
A = c("I am a text r r o n n r and here")
)
# Replacing ron in column with nothing
data %>%
mutate(A = str_replace_all(A, "\\b[ron] \\b", ""))
# A
# I am a text and here
Upvotes: 1