Steve Ford
Steve Ford

Reputation: 75

Java Pattern regex search between strings

Given the following strings (stringToTest):

  1. G2:7JAPjGdnGy8jxR8[RQ:1,2]-G3:jRo6pN8ZW9aglYz[RQ:3,4]
  2. G2:7JAPjGdnGy8jxR8[RQ:3,4]-G3:jRo6pN8ZW9aglYz[RQ:3,4]

And the Pattern:

Pattern p = Pattern.compile("G2:\\S+RQ:3,4");
if (p.matcher(stringToTest).find())
{
    // Match
}

For string 1 I DON'T want to match, because RQ:3,4 is associated with the G3 section, not G2, and I want string 2 to match, as RQ:3,4 is associated with G2 section.

The problem with the current regex is that it's searching too far and reaching the RQ:3,4 eventually in case 1 even though I don't want to consider past the G2 section.

It's also possible that the stringToTest might be (just one section):

G2:7JAPjGdnGy8jxR8[RQ:3,4]

The strings 7JAPjGdnGy8jxR8 and jRo6pN8ZW9aglYz are variable length hashes.

Can anyone help me with the correct regex to use, to start looking at G2 for RQ:3,4 but stopping if it reaches the end of the string or -G (the start of the next section).

Upvotes: 3

Views: 122

Answers (3)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626689

The problem is that \S matches any whitespace char and the regex engine parses the text from left to right. Once it finds G2: it grabs all non-whitespaces to the right (since \S* is a ghreedy subpattern) and then backtracks to find the rightmost occurrence of RQ:3,4.

In a general case, you may use

String regex = "G2:(?:(?!-G)\\S)*RQ:3,4";

See the regex demo. (?:(?!-G)\S)* is a tempered greedy token that will match 0+ occurrences of a non-whitespace char that does not start a -G substring.

If the hyphen is only possible in front of the next section, you may subtract - from \S:

String regex = "G2:[^\\s-]*RQ:3,4"; // using a negated character class
String regex = "G2:[\\S&&[^-]]*RQ:3,4"; // using character class subtraction

See this regex demo. [^\\s-]* will match 0 or more chars other than whitespace and -.

Upvotes: 1

Julio
Julio

Reputation: 5308

Try to use [^[] instead of \S in this regex: G2:[^[]*\[RQ:3,4

[^[] means any character but [

Demo

(considering that strings like this: G2:7JAP[jGd]nGy8[]R8[RQ:3,4] are not possible)

Upvotes: 0

anubhava
anubhava

Reputation: 784888

You may use this regex with a negative lookahead in between:

G2:(?:(?!G\d+:)\S)*RQ:3,4

RegEx Demo

RegEx Details:

  • G2:: Match literal text G2:
  • (?: Start a non-capture group
    • (?!G\d+:): Assert that we don't have a G<digit>: ahead of us
    • \S: Match a non-whitespace character
  • )*: End non-capture group. Match 0 or more of this
  • RQ:3,4: Match literal text RQ:3,4

In Java use this regex:

String re = "G2:(?:(?!G\\d+:)\\S)*RQ:3,4";

Upvotes: 2

Related Questions