phewataal
phewataal

Reputation: 1117

Regular Information pattern

I am in my first few days of Regular Expression learning. I am trying to do a simple pattern match to find an occurrence of @@@XXX@@@ markers in my log file where XXX is an uppercase word with no spaces/numeric values allowed there (underscore allowed too). There can be no or multiple spaces between starting &&& and the actual word or the word and the terminating&&&. XXX is always Upper case and no spaces/numeric values allowed there (underscore is allowed).

Some allowed examples: @@@CAT@@@

@@@ CAT@@@

@@@ CAT @@@

@@@ CAT_DOG @@@

I tried doing something like:

Pattern pattern = Pattern.compile("\\@{3}(\\s* \\w \\s*)\\@{3}");

Doesn't it mean check for 3 instances of @ followed by o to n instances of space followed by a word followed again by o to n instances of space followed by 3 instances of @ ? It captures the cases with @@ but does not capture where more than 3 @ are used. How do I specify there are 3 and only 3 instances of @? ....And obviously I still have not plugged the uppercase restriction.

Upvotes: 1

Views: 77

Answers (3)

Sufian Latif
Sufian Latif

Reputation: 13356

Try this:

"(^|[^@])@{3}\s*[_A-Z]+\s*@{3}($|[^@])"

Upvotes: 0

user177800
user177800

Reputation:

Here is what you should do.

[^@]?@{3}\s*([A-Z_]*)\s*@{3}[^@]
  1. [^@]? matches any single character other than @ optionally ( to exclude matching @@@@ )

  2. @{3} matches exactly 3 @ characters

  3. \s* matches ZERO or MORE whitespace characters

  4. [A-Z_]* matches ZERO or MORE upper case letters or _ characters ). The () that wrap the expression capture the contents in a Group so you can extract the contents easily.

  5. \s* matches ZERO or MORE whitespace characters

  6. @{3} matches exactly 3 @ characters

  7. [^@] matches any single character other than @ ( to exclude matching @@@@ )

Here is an interactive regular expressions page ( with your example worked out ) that I use all the time to work things like this out.

Upvotes: 1

Brigand
Brigand

Reputation: 86230

Try this:

(?:[^@]|^)@{3}(\s*[A-Z_]+\s*)@{3}(?!@)

// or with Java escaping

(?:[^@]|^)@{3}(\\s*[A-Z_]+\\s*)@{3}(?!@)

Putting a literal space makes a required space. So a b is different than ab. In your original pattern you had a space between the \\s+ \w. Also \w matches lower-case and upper-case alike. Using a character class will make only upper case match.


If you want to avoid more than three (as one of the comments suggests) you have to add a little extra code.

At the start we put this, which says either match a non-@ character, or the start of the string ^. The (?:) means we don't care about this part of the match.

(?:[^@]|^)

At the end we have to say the following character cannot be a @. (?!) means fail if the pattern could match. It doesn't end up capturing it, because this is a zero-width lookahead.

(?!@)

I updated the patterns at the top.

These new patterns will not match these.

@@@ CAT_DOG @@@@

@@@@ CAT_DOG @@@

Upvotes: 1

Related Questions