Reputation: 2290
I have a string,
string1 = "Sri Lanka National Chess Championship this year and represented Sri Lanka at represented Sri Lanka Universities at the World University Chess Championships."
And I have another string named 'string2' which only have strings surrounded by '<NOUN> and </NOUN>
' tags separated by a space.
string2 = "<NOUN>Sri Lanka National Chess Championship</NOUN> <NOUN>Sri Lanka</NOUN> <NOUN>Sri Lanka</NOUN> <NOUN>World University Chess</NOUN>"
Note that second string can have any no of noun tagged words(based on the 'string1',eg: if string1 has 3 nouns, string2 will have same 3 nouns surrounded by noun tags)
I want to add tags to the 'string1' and make string1 as follows,
string1 = "<NOUN>Sri Lanka National Chess Championship</NOUN> this year and represented <NOUN>Sri Lanka</NOUN> at represented <NOUN>Sri Lanka</NOUN> Universities at the <NOUN>World University Chess</NOUN> Championships."
I used following code to do this,
Pattern p = Pattern.compile("<NOUN>(.*?)</NOUN>");
Matcher m = p.matcher(string2);
while(m.find()) {
string1= string1.replaceAll(m.group(1),m.group(0));
}
But it gives me following output,
<NOUN><NOUN><NOUN>Sri Lanka</NOUN></NOUN> National Chess Championship</NOUN> this year and represented <NOUN><NOUN>Sri Lanka</NOUN></NOUN> at represented <NOUN><NOUN>Sri Lanka</NOUN></NOUN> Universities at the <NOUN>World University Chess</NOUN> Championships.
Can anyone please tell me how to do this correctly?
Or please tell me how to get the desired output form the given output?
Upvotes: 3
Views: 672
Reputation: 7986
instead of :
string1= string1.replaceAll(m.group(1),m.group(0));
use :
string1= string1.replaceAll("(?<!<NOUN>)("+m.group(1)+")(?!</NOUN>)",m.group(0));
See more about "Look Ahead and Look Behind Constructs" here
Upvotes: 2
Reputation: 5686
The problem with your example is that Sri Lanka National Chess Championship
is a noun and Sri Lanka
, a part of this string is also a noun. So, your matcher is replacing strings a multiple times.
You can solve this issue by not replacing the string fragments that have been replaced already. I broke the string into three parts for each match : before, match-str, after. Maintain the order of the broken strings. Vector is a very convenient data-structure for this.
import java.util.Vector;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Check {
static String print(Vector<String> parts) {
String str = parts.elementAt(0);
for(int i=1; i<parts.size(); i++) {
str += parts.elementAt(i);
//System.out.print(i + " : " + parts.elementAt(i) + "\n");
}
return str;
}
public static void main(String args[]) {
String string1;
String string2;
String expected;
string1 = "Sri Lanka National Chess Championship this year and represented Sri Lanka at represented Sri Lanka Universities at the World University Chess Championships.";
string2 = "<NOUN>Sri Lanka National Chess Championship</NOUN> <NOUN>Sri Lanka</NOUN> <NOUN>Sri Lanka</NOUN> <NOUN>World University Chess</NOUN>";
expected = "<NOUN>Sri Lanka National Chess Championship</NOUN> this year and represented <NOUN>Sri Lanka</NOUN> at represented <NOUN>Sri Lanka</NOUN> Universities at the <NOUN>World University Chess</NOUN> Championships.";
Pattern p = Pattern.compile("<NOUN>(.*?)</NOUN>");
Matcher m = p.matcher(string2);
Vector<String> parts = new Vector<String>();
parts.add(string1);
while(m.find()) {
for(int i=0; i<parts.size(); i++) {
//search for used part
if(parts.elementAt(i).indexOf("<NOUN>")!=-1) {
continue;
}
// search for pattern
String cur = parts.elementAt(i);
int disp = cur.indexOf(m.group(1));
if(disp==-1) {
continue;
} else {
parts.remove(i);
Vector<String> newParts = new Vector<String>();
if(disp!=0) {
newParts.add(cur.substring(0, disp));
}
newParts.add(m.group(0));
if((disp+m.group(1).length())!=cur.length()) {
newParts.add(cur.substring(disp+m.group(1).length()));
}
if(i!=0) {
parts.addAll(i, newParts);
} else {
parts.addAll(newParts);
}
//System.out.print(print(parts) + "\n");
}
}
}
string1 = print(parts);
if(!string1.equals(expected)) {
System.out.println("Unexpected output !!");
} else {
System.out.println("Correct !!");
}
}
};
You can rename the print method to stringify for convenience.
Upvotes: 0