mkuk
mkuk

Reputation: 300

How to use regex to remove punctuations in a sentence

I am trying to take from a file all the valid words. Valid words are defined as normal characters that can appear like so:

don't won't can't

and I have to ignore commas periods and exclamation points.

I have gotten the expression to just get characters but now it won't get words like don't and can't or won't.

This is the expression I am using "[^A-Za-z]+" and I have tried "\'[^A-Za-z]+" but this breaks and allows all characters. Does anyone have any idea what I can use to get normal words including don't and won't and can't and such words.

Thank you very much

Upvotes: 1

Views: 1141

Answers (3)

Razor
Razor

Reputation: 17498

This will match letters from any language and exclude numbers.

\b[\p{L}\!\'\?]+

Here is a very good resource for regular expressions. http://www.regular-expressions.info/

Upvotes: 0

FriendFX
FriendFX

Reputation: 3079

Another way (using abbreviations) is: \b[\w']+

Upvotes: 0

Matt Lacey
Matt Lacey

Reputation: 8255

[^A-Za-z] Would mean anything NOT matching those character ranges! Try this:

[A-Za-z']

You may need to escape the single quote, in which case you'll probably need to escape the slash that escapes it:

[A-Za-z\\']

Upvotes: 1

Related Questions