Nathalie
Nathalie

Reputation: 1238

Remove single alphabetic letters strings

In a string data column how is it possible to check in every row if exist a letter of alphabet and remove it.

Example

I am a text r r o n n r and here

And have as output this

I am a text and here

Upvotes: 2

Views: 665

Answers (3)

akrun
akrun

Reputation: 887651

The condition is not very clear

gsub('\\b[ron] ', '',  txt)
#[1] "I am a text and here"

Or using a more general approach

gsub("(?<=\\b\\K[a-z]) [a-z] ", "", txt, perl = TRUE)
#[1] "I am a text and here"

Or more simpler

gsub('\\b[a-z] [a-z] ', '', txt)
#[1] "I am a text and here"

Or with str_remove_all

library(stringr)
str_remove_all(txt, "\\b[ron] ")
#[1] "I am a text and here"

data

txt <- "I am a text r r o n n r and here"

Upvotes: 1

r2evans
r2evans

Reputation: 160677

gsub("\\s[A-Za-z](?= )", "", "I am a text r r o n n r and here", perl = TRUE)
# [1] "I am text and here"

Since you want to preserve the single a, you can use any of the following for more-specific patterns:

### just three letters: r o n
gsub("\\s[orn](?= )", "", "I am a text r r o n n r and here", perl = TRUE)
# [1] "I am a text and here"

### any single-letter except "a" and "i"
gsub("\\s[B-HJ-Zb-hj-z](?= )", "", "I am a text r r o n n r and here", perl = TRUE)
# [1] "I am a text and here"

(The exception for i in the second example is not strictly needed here, but provided as an example.)

The use of a "look-ahead" ((?= ) is used because you stated the requirement for a space before and after; if you use a pattern of "\\s[orn]\\s", then it will miss many of the single letters (due to recursive replacements). If you relax this a little, then you can use word-boundaries, as in

gsub("\\s[B-HJ-Zb-hj-z]\\b", "", "I am a text r r o n n r and here")

(While using perl-style regexes can technically have a performance penalty, I suspect that that really only matters if you're doing a lot and need to improve performance as much as you can. Not strictly required, and early optimization is "the root of all evil" - Donald Knuth)

Note: In this last pattern, I'm also excepting a lower-case i in addition to the upper-case I; if you are confident that you will never see an otherwise-valid i, then you can adjust your pattern to use [B-HJ-Zb-z] instead. (Thanks to @jay.sf for highlighting this assumption.)

Upvotes: 5

Nareman Darwish
Nareman Darwish

Reputation: 1261

You can do that with stringr package as follows;

library(stringr)
library(dplyr)

# Create dataframe with column
data <-
  data.frame(
    A = c("I am a text r r o n n r and here")
  )

# Replacing ron in column with nothing
data %>%
  mutate(A = str_replace_all(A, "\\b[ron] \\b", ""))

# A
# I am a text and here

Upvotes: 1

Related Questions