user3329098
user3329098

Reputation:

Nested regexps and replace

I have strings like this <p0=v0 p1=v1 p2=v2 ....> and I want to swap pX with vX to have something like <v0=p0 v1=p1 v2=p2 ....> using regexps. I want only pairs in <> to be swapped.

I wrote:

Pattern pattern = Pattern.compile("<(\\w*)=(\\w*)>");
Matcher matcher = pattern.matcher("<p1=v1>");
System.out.println(matcher.replaceAll("$2=$1"));

But it works only with a single pair pX=vX Could someone explain me how to write regexp that works for multiple pairs?

Upvotes: 3

Views: 156

Answers (5)

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89565

You can use this pattern:

"((?:<|\\G(?<!\\A))\\s*)(p[0-9]+)(\\s*=\\s*)(v[0-9]+)"

To ensure that the pairs are after an opening angle bracket, the pattern start with:

(?:<|\\G(?<!\\A))

that means: an opening angle bracket OR at the end of the last match

\\G is an anchor for the position immediatly after the last match or the begining of the string (in other words, it is the last position of the regex engine in the string, that is zero at the start of the string). To avoid a match at the start of the string I added a negative lookbehind (?<!\\A) -> not preceded by the start of the string.

This trick forces each pair to be preceded by an other pair or by a <.

example:

String subject = "p5=v5 <p0=v0 p1=v1 p2=v2 p3=v3> p4=v4";
String pattern = "((?:<|\\G(?<!\\A))\\s*)(p[0-9]+)(\\s*=\\s*)(v[0-9]+)";
String result = subject.replaceAll(pattern, "$1$4$3$2");

If you need p and v to have the same number you can change it to:

String pattern = "((?:<|\\G(?<!\\A))\\s*)(p([0-9]+))(\\s*=\\s*)(v\\3)";
String result = subject.replaceAll(pattern, "$1$5$4$2");

If parts between angle brackets can contain other things (that are not pairs):

String pattern = "((?:<|\\G(?<!\\A))(?:[^\s>]+\\s*)*?\\s*)(p([0-9]+))(\\s*=\\s*)(v\\3)";
String result = subject.replaceAll(pattern, "$1$4$3$2");

Note: all these patterns only checks if there is an opening angle bracket, but don't check if there is a closing angle bracket. If a closing angle bracket is missing, all pairs will be replaced until there is no more contiguous pairs for the two first patterns and until the next closing angle bracket or the end of the string for the third pattern.

You can check the presence of a closing angle bracket by adding (?=[^<>]*>) at the end of each pattern. However adding this will make your pattern not performant at all. It is better to search parts between angle brackets with (?<=<)[^<>]++(?=>) and to perform the replacement of pairs in a callback function. You can take a look at this post to implement it.

Upvotes: 0

user557597
user557597

Reputation:

If Java can do the \G anchor, this will work for unnested <>'s
Find: ((?:(?!\A|<)\G|<)[^<>]*?)(\w+)=(\w+)(?=[^<>]*?>)
Replace (globally): $1$3=$2

Regex explained

 (                     # (1 start)
      (?:
           (?! \A | < )
           \G                    # Start at last match
        |  
           <                     # Or, <
      )
      [^<>]*? 
 )                     # (1 end)
 ( \w+ )               # (2)
 =
 ( \w+ )               # (3)
 (?= [^<>]*? > )       # There must be a closing > ahead

Perl test case

$/ = undef;
$str = <DATA>;
$str =~ s/((?:(?!\A|<)\G|<)[^<>]*?)(\w+)=(\w+)(?=[^<>]*?>)/$1$3=$2/g;
print $str;
__DATA__
<p0=v0 p1=v1  p2=v2 ....>

Output >>

<v0=p0 v1=p1  v2=p2 ....>

Upvotes: 0

anubhava
anubhava

Reputation: 785376

This should work to swap only those pairs between < and >:

String string = "<p0=v0 p1=v1 p2=v2> a=b c=d xyz=abc <foo=bar baz=bat>";
Pattern pattern1 = Pattern.compile("<[^>]+>");
Pattern pattern2 = Pattern.compile("(\\w+)=(\\w+)");
Matcher matcher1 = pattern1.matcher(string);
StringBuffer sbuf = new StringBuffer();
while (matcher1.find()) {
    Matcher matcher2 = pattern2.matcher(matcher1.group());
    matcher1.appendReplacement(sbuf, matcher2.replaceAll("$2=$1"));
}
matcher1.appendTail(sbuf);
System.out.println(sbuf);

OUTPUT:

<v0=p0 v1=p1 v2=p2> a=b c=d xyz=abc <bar=foo bat=baz>

Upvotes: 0

KeyNone
KeyNone

Reputation: 9160

To replace everything between < and > (let's call it tag) is - imho - not possible if the same pattern can occur outside the tag.

Instead to replace everything at once, I'd go for two regexes:

String str = "<p1=v1 p2=v2> p3=v3 <p4=v4>";
Pattern insideTag = Pattern.compile("<(.+?)>");
Matcher m = insideTag.matcher(str);

while(m.find()) {
    str = str.replace(m.group(1), m.group(1).replaceAll("(\\w*)=(\\w*)", "$2=$1"));
}
System.out.println(str);

//prints: <v1=p1 v2=p2> p3=v3 <v4=p4>

The matcher grabs everything between < and > and for each match it replaces the content of the first capturing group with the swapped one on the original string, but only if it matches (\w*)=(\w*), of course.

Trying it with

<p1=v1 p2=v2 just some trash> p3=v3 <p4=v4>

gives the output

<v1=p1 v2=p2 just some trash> p3=v3 <v4=p4>

Upvotes: 0

Mena
Mena

Reputation: 48404

Simple, use groups:

String input = "<p0=v0 p1=v1 p2=v2>";
//                                   |group 1
//                                   ||matches "p" followed by one digit
//                                   ||      |... followed by "="
//                                   ||      ||group 2
//                                   ||      |||... followed by "v", followed by one digit
//                                   ||      |||          |replaces group 2 with group 1,
//                                   ||      |||          |re-writes "=" in the middle
System.out.println(input.replaceAll("(p[0-9])=(v[0-9])", "$2=$1"));

Output:

<v0=p0 v1=p1 v2=p2>

Upvotes: 2

Related Questions