R regex remove apostroph except the ones preceded and followed by letter

Question

I'm cleaning a text and I'd like to remove any apostrophe except for the ones preceded and followed by letters such as in : i'm, i'll, he's..etc.

I the following preliminary solution, handling many cases, but I want a better one:

rmAps <- function(x) gsub("^\'+| \'+|\'+ |[^[:alpha:]]\'+(a-z)*|\b\'*$", " ", x)

rmAps("'i'm '' ' 'we end' '")
[1] " i'm   we end  "

I also tried:

(?



But I think I am still missing sth.

Jota · Accepted Answer

gsub("'(?!\w)|(?



Remove occasions when your character is not followed by a word character:  '(?!\w).

Remove occasions when your character is not preceded by a word character:  (?.


If either of those situations occur, you want to remove it, so '(?!\w)|(? should do the trick.  Just note that \w includes the underscore, and adjust as necessary.




Another option is 

gsub("\w'\w(*SKIP)(*FAIL)|'", "", x, perl = TRUE)


In this case, you match any instances when ' is surrounded by word characters: \w'\w, and then force that match to fail with (*SKIP)(*FAIL).  But, also look for ' using |'.  The result is that only occurrences of ' not wrapped in word characters will be matched and substituted out.

R regex remove apostroph except the ones preceded and followed by letter

Answers (2)

Related Questions