Reputation: 1932
I was looking for a regex to match words with hyphens and/or apostrophes. So far, I have:
(\w+([-'])(\w+)?[']?(\w+))
and that works most of the time, though if there's a apostrophe and then a hyphen, like "qu'est-ce", it doesn't match. I could append more optionals, though perhaps there's another more efficient way?
Some examples of what I'm trying to match: Mary's, High-school, 'tis, Chambers', Qu'est-ce.
Upvotes: 34
Views: 89810
Reputation: 388
Another riff on similar answers:
/\b\w+([-']\w+)*\b/g
\b # word boundary
\w+ # at least one word char
( # followed by a group that:
[-'] # starts with a hyphen or apostrophe
\w+ # followed by at least one word char
)* # and this group can appear any number of times (including zero)
\b # word boundary
In my case, I needed to exclude words that start with apostrophes or hyphens, and also words with those characters repeated.
But terms like Stratford-upon-Avon are ok.
Note: I did not need to account for words starting or ending with apostrophe.
Upvotes: 0
Reputation: 1
Use
([\w]+[']*[\w]*)|([']*[\w]+)
It will properly parse
"You've and we i've it' '98"
(supports '
in any place in the word but single '
is ignored).
If needed \w
could be replaced with [a-zA-Z]
etc.
Upvotes: 0
Reputation: 1
This worked for me:
([a-zA-Z]+'?-?[a-zA-Z]+(-?[a-zA-Z])?)|[a-zA-Z]
Upvotes: 0
Reputation: 17357
The problem you're running into is that you actually have three possible sub-patterns: one or more chars, an apostrophe followed by one or more chars, and a hyphen followed by one or more chars.
This presumes you don't wish to accept words that begin or end with apostrophes or hyphens or have hyphens next to apostrophes (or vice versa).
I believe the best way to represent this in a RegExp would be:
/\b[a-z]+(?:['-]?[a-z]+)*\b/
which is described as:
\b # word-break
[a-z]+ # one or more
(?: # start non-matching group
['-]? # zero or one
[a-z]+ # one or more
)* # end of non-matching group, zero or more
\b # word-break
which will match any word that begins and ends with an alpha and can contain zero or more groups of either a apos or a hyphen followed by one or more alpha.
Upvotes: 5
Reputation: 7948
use this pattern
(?=\S*['-])([a-zA-Z'-]+)
(?= # Look-Ahead
\S # <not a whitespace character>
* # (zero or more)(greedy)
['-] # Character in ['-] Character Class
) # End of Look-Ahead
( # Capturing Group (1)
[a-zA-Z'-] # Character in [a-zA-Z'-] Character Class
+ # (one or more)(greedy)
) # End of Capturing Group (1)
Upvotes: 47
Reputation: 2637
How about: \'?\w+([-']\w+)*\'?
I suppose these words shouldn't be matched:
something-
or -something
: start or end with -
some--thing
or some'-thing
: -
not followed by a character some''
: two hyphensUpvotes: 0
Reputation: 461
[\w'-]+
would match pretty much any occurrence of words with (or without) hyphens and apostrophes, but also in cases where those characters are adjacent.
(?:\w|['-]\w)+
should match cases where the characters can't be adjacent.
If you need to be sure that the word contains hyphens and/or apostrophes and that those characters aren't adjacent maybe try \w*(?:['-](?!['-])\w*)+
. But that would also match ' and - alone.
Upvotes: 27
Reputation: 13974
debuggex.com is a great resource for visualizing these sorts of things
\b\w*[-']\w*\b
should do the trick
Upvotes: 11