user1871245
user1871245

Reputation:

Regex to strip all punctuation except '

I'd like to strip all punctuation from a block of text that I import except for ', such as the ' in doesn't.

I currently have

$words = preg_replace('/[^a-z]+/i', '', $words);

Which strips all the punctuation, but I'm unsure of how to include '.

How can I achieve this?

Upvotes: 1

Views: 4234

Answers (4)

bukart
bukart

Reputation: 4906

try it so

preg_replace( '/[^\w\']+|\'(?!\w)|(?<!\w)\'/', '', $words )

this should replace all non-letters and also single apostrophs outside a word

untested yet, please let me know if it works

update

to remove numbers, too, just use this regex

/[^\w\']+|\'(?!\w)|(?<!\w)\'|\d+/

just added \d+, so numbers matches and will be removed

Upvotes: 0

Toto
Toto

Reputation: 91375

To remove punctuation characters with unicode property, do:

 preg_replace('/\p{Punctuation}+/u', '', $words);

or

 preg_replace('/\p{P}+/u', '', $words);

To remove all punctuation except single quote:

 preg_replace("/[^\P{P}']+/u", '', $words);

Have a look at here.

Upvotes: 1

Tomalak
Tomalak

Reputation: 338148

/(?!'\b)[[:punct:]] ?/

This matches any punctuation character unless it's an apostrophe followed by a character (i.e. a word boundary, which implies a character).

See http://rubular.com/r/VJ0J5c25vc

Upvotes: 1

Joey
Joey

Reputation: 354406

You can use

(?!')\p{P}

to match any punctuation except an apostrophe. E.g.

preg_replace('/(?!\')\p{P}/gu', '', $str);

Upvotes: 0

Related Questions