Reputation: 13287
I want to pull out capitalized words that don't start a sentence along with the previous and following word.
I'm using:
(\w*)\b([A-Z][a-z]\w*)\b(\w*)
replace with:
$1 -- $2 -- $3
Edit: It's only returning the $2. Will try suggestions.
And regarding natural language? Don't care for this thing. I just want to see where capitals show up in a sentence so I can figure out if they're proper or not.
Upvotes: 2
Views: 1788
Reputation: 1873
How about this?
([a-zA-Z]+)\s([A-Z][a-z]*)\s([a-zA-Z]+)
This doesn't take into account anything non-alphabetic though. It also assumes that all words are separated by a single whitespace character. You will need to modify it if you want more complex support.
Upvotes: 2
Reputation: 336488
Right now your regex fails because the \b
can never match. It matches only between alphanumeric and non-alphanumeric characters; therefore it can never match between \w*
and [A-Z]
or another \w*
.
So, you need some other (=non-alphanumeric) characters between your words:
Try
(\w*)\W+([A-Z][a-z]\w*)\W+(\w*)
although (if your regex engine allows using Unicode properties), you might be happier with
(\w*)\W+(\p{Lu}\p{Ll}\w*)\W+(\w*)
As written, only capitalized words of length 2 or more are matched, i. e. "I" (as in "me") will not be matched by this. I suppose you inserted the [a-z]
to avoid matches like "IBM"? Or what was your intention?
Upvotes: 2