Peter
Peter

Reputation: 1759

Regex for multiple words between special characters

I'm trying to get every group of words with at least one word between some special characters with a regular expression in Java. These are some sample strings to clarify it:

{ ? <> <> ; <> ? ; <> ? . ? <> ? . ? <> ? . ? <> ? }
{ <> <> ? . <> <> ? }
{ <> <> <> }
{ OPTIONAL { <> <> ? } FILTER ( ! bound(?) ) }
{ FILTER not exists ( ! bound(?) ) }
{ <> <> ? . ? <> ? }
{ ? <> <> ; a <> }
{ <> <> ?@en }
{ <> <> <> }
{ <> <> ? . <> <> ? FILTER ( ? > ? ) }
{ <> <> ? . ? <> ? FILTER regex(? ?) }
{ <> <> ? FILTER ( ! bound(?) ) }
{ ? <> ? ; <> ? . ? <> ? }
{ ? <> ? ; <> ? . ?2 <> ? ; <> ? }
{ ? <> <> ; <> ? . ? <> ? }
{ <> <> ? . <> <> ? FILTER ( ? = ? ) }

My matches shall look like this:

OPTIONAL
FILTER
bound
FILTER not exists
bound
...

This is the regex I've come up with so far:

[^\d\W\\a\@]+

You can test it here: https://regex101.com/r/cP3Uri/2

My problem is that my regex will find only full words and no groups of words (with a space in between). This means this substring FILTER not exists will get 3 matches (one for every word) but I want it to be just one match.

Can anyone help me finding the correct regex?

Upvotes: 1

Views: 4292

Answers (3)

Pavneet_Singh
Pavneet_Singh

Reputation: 37404

You can use [a-zA-Z]{2}[a-zA-Z ]*\\b to find minimum a two character word

  • [a-zA-Z]{2} : match exactly 2 upper or lower case letter
  • [a-zA-Z ]*\\b : match zero or more upper and lower case characters , word boundary

To find only words followed by only words with spaces use

[a-zA-Z]{2}(?:\\s*[a-zA-Z]{2,})*

Upvotes: 3

logi-kal
logi-kal

Reputation: 7880

You can use one of these, which respect your original pattern:

[^\d\W\\a\@]([^\d\W\\a\@]| )*\b
[^\d\W\\a\@]+( +[^\d\W\\a\@]+)*

See demo: 1 and 2

Upvotes: 1

lmartens
lmartens

Reputation: 1502

\w+(?:\s*\w+)*

for capturing all groups including the 'a' and 2 character

\w{2}(?:\s*\w+)*

for only capturing groups with more than one character

you can replace \w with [a-zA-Z] to exclude digits.

see https://regex101.com/r/cP3Uri/7

Upvotes: 2

Related Questions