Reputation: 300
I am trying to take from a file all the valid words. Valid words are defined as normal characters that can appear like so:
don't won't can't
and I have to ignore commas periods and exclamation points.
I have gotten the expression to just get characters but now it won't get words like don't and can't or won't
.
This is the expression I am using "[^A-Za-z]+"
and I have tried "\'[^A-Za-z]+"
but this breaks and allows all characters. Does anyone have any idea what I can use to get normal words including don't and won't and can't and such words.
Thank you very much
Upvotes: 1
Views: 1141
Reputation: 17498
This will match letters from any language and exclude numbers.
\b[\p{L}\!\'\?]+
Here is a very good resource for regular expressions. http://www.regular-expressions.info/
Upvotes: 0
Reputation: 8255
[^A-Za-z]
Would mean anything NOT matching those character ranges! Try this:
[A-Za-z']
You may need to escape the single quote, in which case you'll probably need to escape the slash that escapes it:
[A-Za-z\\']
Upvotes: 1