user3664020
user3664020

Reputation: 3020

How to remove a specific pattern from a string in R?

I have this string (for example).

str <- "T gwed is atyrt mtt yfdgfg grter effgf y"

I want to remove lone occurring alphabets from this string ('T' at the start and 'y' at the end in this case) and output should be

"gwed is atyrt mtt yfdgfg grter effgf"

I used this

str <- gsub("[A-Za-z] ", "", str)

But it gives this as a result.

[1] "gweiatyrmtyfdgfgrtey"

Here it considers cases like "gwed " also and hence it merges every word of the string.

How do i achieve my aim?

Also, I have this huge text with thousands of strings (not just a single string), so keep this in mind while providing an answer.

Upvotes: 1

Views: 2627

Answers (2)

Henrik
Henrik

Reputation: 14450

str <- "T gwed is atyrt mtt yfdgfg grter effgf y"

gsub(" ?\\<[[:alpha:]]\\> ?", "", str)

## [1] "gwed is atyrt mtt yfdgfg grter effgf"

You need to use the special character to denote word boundaries, i.e., \\< and \\>. The _? (where _ is a space) denotes that you also want to remove single spaces around the single letters (if present). See ?regex for more.

Upvotes: 3

agstudy
agstudy

Reputation: 121568

Another option wthout using regular expressions:

xx <- unlist(strsplit(str, " "))
paste(xx[nchar(xx)>1],collapse=' ')

[1] "gwed is atyrt mtt yfdgfg grter effgf"

Upvotes: 1

Related Questions