How to remove a specific pattern from a string in R?

Question

I have this string (for example).

str <- "T gwed is atyrt mtt yfdgfg grter effgf y"

I want to remove lone occurring alphabets from this string ('T' at the start and 'y' at the end in this case) and output should be

"gwed is atyrt mtt yfdgfg grter effgf"

I used this

str <- gsub("[A-Za-z] ", "", str)

But it gives this as a result.

[1] "gweiatyrmtyfdgfgrtey"

Here it considers cases like "gwed " also and hence it merges every word of the string.

How do i achieve my aim?

Also, I have this huge text with thousands of strings (not just a single string), so keep this in mind while providing an answer.

Henrik · Accepted Answer

str <- "T gwed is atyrt mtt yfdgfg grter effgf y"

gsub(" ?\<[[:alpha:]]\> ?", "", str)

## [1] "gwed is atyrt mtt yfdgfg grter effgf"

You need to use the special character to denote word boundaries, i.e., \< and \>. The _? (where _ is a space) denotes that you also want to remove single spaces around the single letters (if present). See ?regex for more.

How to remove a specific pattern from a string in R?

Answers (2)

Related Questions