Aditya Garimella
Aditya Garimella

Reputation: 933

Replace group 1 of Java regex with out replacing the entire regex

I have a regex pattern that will have only one group. I need to find texts in the input strings that follows the pattern and replace ONLY the match group 1. For example I have the regex pattern and the string to be applied on as shown below. The replacement string is "<---->"

Pattern p = Pattern.compile("\\w*(lan)\\w+");
Matcher m = p.matcher("plan plans lander planitia");

The expected result is

plan p<--->s <--->der p<--->itia

I tried following approaches

    String test = "plan plans lander planitia";
    Pattern p = Pattern.compile("\\w*(lan)\\w+");
    Matcher m = p.matcher(test);
    String result = "";
    while(m.find()){
        result = test.replaceAll(m.group(1),"<--->");
    }
    System.out.print(result);

This gives result as

p<---> p<--->s <--->der p<--->itia

Another approach

    String test = "plan plans lander planitia";
    Pattern p = Pattern.compile("\\w*(lan)\\w+");
    Matcher m = p.matcher(test);
    String result = "";
    while(m.find()){
        result = test.replaceAll("\\w*(lan)\\w+","<--->");
    }
    System.out.print(result);

Result is

plan <---> <---> <--->

I have gone through this link. Here the part of the string before the match is always constant and is "foo" but in my case it varies. Also I have looked at this and this but I am unable to apply any on the solutions given to my present scenario.

Any help is appreciated

Upvotes: 21

Views: 34167

Answers (4)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626950

You need to use the following pattern with capturing groups:

(\w*)lan(\w+)
^-1-^   ^-2-^

and replace with $1<--->$2

See the regex demo

The point is that we use a capturing group around the parts that we want to keep and just match what we want to discard.

Java demo:

String str = "plan plans lander planitia";
System.out.println(str.replaceAll("(\\w*)lan(\\w+)", "$1<--->$2"));
// => plan p<--->s <--->der p<--->itia

If you need to be able to replace the Group 1 and keep the rest, you may use the replace callback method emulation with Matcher#appendReplacement:

String text = "plan plans lander planitia";
String pattern = "\\w*(lan)\\w+";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(text);
StringBuffer sb = new StringBuffer();
while (m.find()) {
    m.appendReplacement(sb, m.group(0).replaceFirst(Pattern.quote(m.group(1)), "<--->"));
}
m.appendTail(sb); // append the rest of the contents
System.out.println(sb.toString());
// output => plan p<--->s <--->der p<--->itia

See another Java demo

Here, since we process a match by match, we should only replace the Group 1 contents once with replaceFirst, and since we replace the substring as a literal, we should Pattern.quote it.

Upvotes: 42

Ondřej Menčl
Ondřej Menčl

Reputation: 11

I like others solutions. This is slightly optimalised bulletproof version:

public static void main (String [] args) {
    int groupPosition = 1;
    String replacement = "foo";
    Pattern r = Pattern.compile("foo(bar)");
    Matcher m = r.matcher("bar1234foobar1234bar");
    StringBuffer sb = new StringBuffer();
    while (m.find()) {
        StringBuffer buf = new StringBuffer(m.group());
        buf.replace(m.start(groupPosition)-m.start(), m.end(groupPosition)-m.start(), replacement); 
        m.appendReplacement(sb, buf.toString());
    }
    m.appendTail(sb); 
    System.out.println(sb.toString()); // result is "bar1234foofoo1234bar"
}

Upvotes: 1

Andreas
Andreas

Reputation: 159124

To dynamically control the replacement value, use a find() loop with appendReplacement(), finalizing the result with appendTail().

That way you have full control of the replacement value. In your case, the pattern is the following, and you can get the positions indicated.

   start(1)
      ↓  end(1)
      ↓    ↓
  \\w*(lan)\\w+
  ↑            ↑
start()      end()

You can then extract the values to keep.

String input = "plan plans lander planitia";

StringBuffer buf = new StringBuffer();
Matcher m = Pattern.compile("\\w*(lan)\\w+").matcher(input);
while (m.find())
    m.appendReplacement(buf, input.substring(m.start(), m.start(1)) +
                             "<--->" +
                             input.substring(m.end(1), m.end()));
String output = m.appendTail(buf).toString();

System.out.println(output);

Output

plan p<--->s <--->der p<--->itia

If you don't like that it uses the original string, you can use the matched substring instead.

StringBuffer buf = new StringBuffer();
Matcher m = Pattern.compile("\\w*(lan)\\w+").matcher("plan plans lander planitia");
while (m.find()) {
    String match = m.group();
    int start = m.start();
    m.appendReplacement(buf, match.substring(0, m.start(1) - start) +
                             "<--->" +
                             match.substring(m.end(1) - start, m.end() - start));
}
String output = m.appendTail(buf).toString();

Upvotes: 4

Sebastian Proske
Sebastian Proske

Reputation: 8413

While Wiktors explanation of the use of capturing groups is completely correct, you could avoid using them at all. The \\w* at the start of your pattern seems irrelevant, as you want to keep it anyways, so we can simply leave it out of the pattern. The check for a word-character after lan can be done using a lookahead, like (?=\w), so we actually only match lan in a pattern like "lan(?=\\w)" and can do a simple replace with "<--->" (or whatever you like).

Upvotes: 1

Related Questions