Reputation: 1117
I am in my first few days of Regular Expression learning. I am trying to do a simple pattern match to find an occurrence of @@@XXX@@@ markers in my log file where XXX is an uppercase word with no spaces/numeric values allowed there (underscore allowed too). There can be no or multiple spaces between starting &&& and the actual word or the word and the terminating&&&. XXX is always Upper case and no spaces/numeric values allowed there (underscore is allowed).
Some allowed examples: @@@CAT@@@
@@@ CAT@@@
@@@ CAT @@@
@@@ CAT_DOG @@@
I tried doing something like:
Pattern pattern = Pattern.compile("\\@{3}(\\s* \\w \\s*)\\@{3}");
Doesn't it mean check for 3 instances of @ followed by o to n instances of space followed by a word followed again by o to n instances of space followed by 3 instances of @ ? It captures the cases with @@ but does not capture where more than 3 @ are used. How do I specify there are 3 and only 3 instances of @? ....And obviously I still have not plugged the uppercase restriction.
Upvotes: 1
Views: 77
Reputation:
Here is what you should do.
[^@]?@{3}\s*([A-Z_]*)\s*@{3}[^@]
[^@]? matches any single character other than @
optionally ( to exclude matching @@@@
)
@{3}
matches exactly 3 @
characters
\s*
matches ZERO or MORE whitespace characters
[A-Z_]* matches ZERO or MORE upper case letters or _
characters ).
The ()
that wrap the expression capture the contents in a Group so you
can extract the contents easily.
\s*
matches ZERO or MORE whitespace characters
@{3}
matches exactly 3 @
characters
[^@] matches any single character other than @
( to exclude matching @@@@
)
Here is an interactive regular expressions page ( with your example worked out ) that I use all the time to work things like this out.
Upvotes: 1
Reputation: 86230
Try this:
(?:[^@]|^)@{3}(\s*[A-Z_]+\s*)@{3}(?!@)
// or with Java escaping
(?:[^@]|^)@{3}(\\s*[A-Z_]+\\s*)@{3}(?!@)
Putting a literal space makes a required space. So a b
is different than ab
. In your original pattern you had a space between the \\s+ \w
. Also \w
matches lower-case and upper-case alike. Using a character class will make only upper case match.
If you want to avoid more than three (as one of the comments suggests) you have to add a little extra code.
At the start we put this, which says either match a non-@ character, or the start of the string ^
. The (?:)
means we don't care about this part of the match.
(?:[^@]|^)
At the end we have to say the following character cannot be a @. (?!)
means fail if the pattern could match. It doesn't end up capturing it, because this is a zero-width lookahead.
(?!@)
I updated the patterns at the top.
These new patterns will not match these.
@@@ CAT_DOG @@@@
@@@@ CAT_DOG @@@
Upvotes: 1