Netro
Netro

Reputation: 7297

Java regex and replacement

Hi I am trying to understand Java regex replacement. I have lots of regex and replacement to apply on text in a file. I want to read regex and apply replacement on text. Like, I want to replace text to variable in following example.

import java.util.regex.*;
public class regex1{
public static void main(String args[]){
    String s1 = "cat catches dog text";
    Pattern p1 = Pattern.compile("\\s*cat\\s+catches\\s*dog\\s+(\\S+)");
    Matcher m1 = p1.matcher(s1);
    if (m1.find()){
        System.out.println(m1.group(1));
        s1 = m1.replaceFirst("variable $1");
        System.out.println(s1);
    }
    else{
        System.out.println("Else");
    }
}    
}

But I get output as

text
variable text

Can any one explain how does group and replacement works in java? How to get correct output?

Upvotes: 3

Views: 333

Answers (5)

Mr. Beer
Mr. Beer

Reputation: 255

String.replaceFirst takes 2 arguments: the regex and the replace string.

So in your example replace,

s1 = s1.replaceFirst("variable $1");

with

s1 = s1.replaceFirst(m1.group(1), "variable");

Upvotes: 0

Enigmadan
Enigmadan

Reputation: 3408

$1 is a back-referencing method that includes the contents of the 1st ($1) set of parenthesis.

In your case $1 references the first reference created in matcher m1. A reference to the word "text".

Explanation of the code

                        referenced by $1
                             ↓↓↓↓
String s1 = "cat catches dog text";
                                                        reference $1
                                                           ↓    ↓
Pattern p1 = Pattern.compile("\\s*cat\\s+catches\\s*dog\\s+(\\S+)");

Explanation of the regex here. Hover over the colored text to get explanations.

It is important to note that \S (capital 'S') matches any non-whitespace character and the the + is greedy. This means we are getting "all non-whitespace characters before the next whitespace" or simply put, we are getting the next word.

In this case the word being matched happens to be "text"

Matcher m1 = p1.matcher(s1);

m1 will now match "cat catches dog text"

s1 = m1.replaceFirst("variable $1");

s1 is set to s1 ("cat catches dog text") where the first appearance of m1 (the first appearance of '"cat catches dog" followed by any word') is replaced by '"variable" followed by that same word'

If you truly want to have variable replace the word "text" then you need to remove the $1.

s1 = m1.replaceFirst("variable");

s1 is set to s1 ("cat catches dog text") where the first appearance of m1 (the first appearance of '"cat catches dog" followed by any word') is replaced by '"variable"'

I this is the case, you don't actually need to include the parenthesis on the RegEx pattern either. They have no use (in this case) if you are not going to be back-referencing.

Upvotes: 0

Pshemo
Pshemo

Reputation: 124215

I am not entirely sure what are you trying to do. If you want to replace word after \\s*cat\\s+catches\\s*dog\\s+ with variable then maybe try this way

String s1 = "cat catches dog text";
Pattern p1 = Pattern.compile("(\\s*cat\\s+catches\\s*dog\\s+)(\\S+)");
Matcher m1 = p1.matcher(s1);
if (m1.find()) {
    System.out.println(m1.group(2));
    s1 = m1.replaceFirst("$1variable");
    System.out.println(s1);
} else {
    System.out.println("Else");
}

now group 1 is (\\s*cat\\s+catches\\s*dog\\s+) and you are putting it back to replacement with $1 and adding variable at the end.

output:

text
cat catches dog variable

BTW you don't have to invoke if (m1.find()) if you want to use replaceFirst or replaceAll. Just use it like

String s1 = "cat catches dog text";
Pattern p1 = Pattern.compile("(\\s*cat\\s+catches\\s*dog\\s+)(\\S+)");
Matcher m1 = p1.matcher(s1);
s1 = m1.replaceFirst("$1variable");
System.out.println(s1);

or if you wont need Pattern and Matcher any more just

String s1 = "cat catches dog text";
s1.replaceFirst("(\\s*cat\\s+catches\\s*dog\\s+)(\\S+)","$1variable");

Upvotes: 1

anubhava
anubhava

Reputation: 784998

Use this code:

String s1 = "cat catches dog text";
Pattern p1 = Pattern.compile("\\s*cat\\s+catches\\s*dog\\s+(\\S+)");
Matcher m1 = p1.matcher(s1);
if (m1.find()){
    s1 = m1.replaceFirst(s1.substring(0, m1.start(1)) + "variable");
}
else{
    System.out.println("Else");
}
System.out.println(s1);
// cat catches dog variable

Upvotes: 2

Ruchira Gayan Ranaweera
Ruchira Gayan Ranaweera

Reputation: 35557

Try this

import java.util.regex.*;
public class regex1{
public static void main(String args[]){
    String s1 = "cat catches dog text";
    Pattern p1 = Pattern.compile("\\s*cat\\s+catches\\s*dog\\s+(\\S+)");
    Matcher m1 = p1.matcher(s1);
    if (m1.find()){
        System.out.println(m1.group(1));
        s1 = s1.replaceFirst(m1.group(1),"variable");
        System.out.println(s1);
    }
    else{
        System.out.println("Else");
    }
}
}

Upvotes: 1

Related Questions