Reputation:
I'd like to strip all punctuation from a block of text that I import except for ', such as the ' in doesn't.
I currently have
$words = preg_replace('/[^a-z]+/i', '', $words);
Which strips all the punctuation, but I'm unsure of how to include '.
How can I achieve this?
Upvotes: 1
Views: 4234
Reputation: 4906
try it so
preg_replace( '/[^\w\']+|\'(?!\w)|(?<!\w)\'/', '', $words )
this should replace all non-letters and also single apostrophs outside a word
untested yet, please let me know if it works
update
to remove numbers, too, just use this regex
/[^\w\']+|\'(?!\w)|(?<!\w)\'|\d+/
just added \d+
, so numbers matches and will be removed
Upvotes: 0
Reputation: 91375
To remove punctuation characters with unicode property, do:
preg_replace('/\p{Punctuation}+/u', '', $words);
or
preg_replace('/\p{P}+/u', '', $words);
To remove all punctuation except single quote:
preg_replace("/[^\P{P}']+/u", '', $words);
Have a look at here.
Upvotes: 1
Reputation: 338148
/(?!'\b)[[:punct:]] ?/
This matches any punctuation character unless it's an apostrophe followed by a character (i.e. a word boundary, which implies a character).
See http://rubular.com/r/VJ0J5c25vc
Upvotes: 1
Reputation: 354406
You can use
(?!')\p{P}
to match any punctuation except an apostrophe. E.g.
preg_replace('/(?!\')\p{P}/gu', '', $str);
Upvotes: 0