Reputation: 1759
I'm trying to get every group of words with at least one word between some special characters with a regular expression in Java. These are some sample strings to clarify it:
{ ? <> <> ; <> ? ; <> ? . ? <> ? . ? <> ? . ? <> ? }
{ <> <> ? . <> <> ? }
{ <> <> <> }
{ OPTIONAL { <> <> ? } FILTER ( ! bound(?) ) }
{ FILTER not exists ( ! bound(?) ) }
{ <> <> ? . ? <> ? }
{ ? <> <> ; a <> }
{ <> <> ?@en }
{ <> <> <> }
{ <> <> ? . <> <> ? FILTER ( ? > ? ) }
{ <> <> ? . ? <> ? FILTER regex(? ?) }
{ <> <> ? FILTER ( ! bound(?) ) }
{ ? <> ? ; <> ? . ? <> ? }
{ ? <> ? ; <> ? . ?2 <> ? ; <> ? }
{ ? <> <> ; <> ? . ? <> ? }
{ <> <> ? . <> <> ? FILTER ( ? = ? ) }
My matches shall look like this:
OPTIONAL
FILTER
bound
FILTER not exists
bound
...
This is the regex I've come up with so far:
[^\d\W\\a\@]+
You can test it here: https://regex101.com/r/cP3Uri/2
My problem is that my regex will find only full words and no groups of words (with a space in between). This means this substring FILTER not exists
will get 3 matches (one for every word) but I want it to be just one match.
Can anyone help me finding the correct regex?
Upvotes: 1
Views: 4292
Reputation: 37404
You can use [a-zA-Z]{2}[a-zA-Z ]*\\b
to find minimum a two character word
[a-zA-Z]{2}
: match exactly 2 upper or lower case letter[a-zA-Z ]*\\b
: match zero or more upper and lower case characters , word boundary To find only words followed by only words with spaces use
[a-zA-Z]{2}(?:\\s*[a-zA-Z]{2,})*
Upvotes: 3
Reputation: 7880
You can use one of these, which respect your original pattern:
[^\d\W\\a\@]([^\d\W\\a\@]| )*\b
[^\d\W\\a\@]+( +[^\d\W\\a\@]+)*
Upvotes: 1
Reputation: 1502
\w+(?:\s*\w+)*
for capturing all groups including the 'a' and 2 character
\w{2}(?:\s*\w+)*
for only capturing groups with more than one character
you can replace \w with [a-zA-Z] to exclude digits.
see https://regex101.com/r/cP3Uri/7
Upvotes: 2