Reputation: 1527
I'm able to remove all punctuation from a string while keeping apostrophes, but I'm now stuck on how to remove any apostrophes that are not between two letters.
str1 <- "I don't know 'how' to remove these ' things"
Should look like this:
"I don't know how to remove these things"
Upvotes: 2
Views: 2237
Reputation: 627292
You may use a regex approach:
str1 <- "I don't know 'how' to remove these ' things"
gsub("\\s*'\\B|\\B'\\s*", "", str1)
See this IDEONE demo and a regex demo.
The regex matches:
\\s*'\\B
- 0+ whitespaces, '
and a non-word boundary|
- or\\B'\\s*
- a non-word boundary, '
and 0+ whitespacesIf you do not need to care about the extra whitespace that can remain after removing standalone '
, you can use a PCRE regex like
\b'\b(*SKIP)(*F)|'
See the regex demo
Explanation:
\b'\b
- match a '
in-between word characters(*SKIP)(*F)
- and omit the match|
- or match...'
- an apostrophe in another context.See an IDEONE demo:
gsub("\\b'\\b(*SKIP)(*F)|'", "", str1, perl=TRUE)
To account for apostrophes in-between Unicode letters, add (*UTF)(*UCP)
flags at the start of the pattern and use a perl=TRUE
argument:
gsub("(*UTF)(*UCP)\\s*'\\B|\\B'\\s*", "", str1, perl=TRUE)
^^^^^^^^^^^^ ^^^^^^^^^
Or
gsub("(*UTF)(*UCP)\\b'\\b(*SKIP)(*F)|'", "", str1, perl=TRUE)
^^^^^^^^^^^^
Upvotes: 5
Reputation: 38510
This method using gsub
work:
gsub("(([^A-Za-z])'|'([^A-Za-z]))", "\\2 ", str1)
"I don't know how to remove these things"
It would require a second round to remove extra spaces. So
gsub(" +", " ", gsub("(([^A-Za-z])'|'([^A-Za-z]))", "\\2 ", str1))
Upvotes: 4
Reputation: 109994
Here's one approach using lookarounds in base:
gsub("(?<![a-zA-Z])(')|(')(?![a-zA-Z])", "", str1, perl=TRUE)
## [1] "I don't know how to remove these things"
Upvotes: 3