Reputation: 53
public class TestUtil {
public static void main(String[] args) {
StringBuffer test = new StringBuffer();
test.append("abacbsidfslhfadskljfhdskh adsfkjlhdslkfhas lkajdsfhak dsjfhs akhasdf adsjkfh asldjkfhds glakdshgf dghkads ghklgh asdflkghadfkl <p rendition=\"#indent-1\">1. Geschl. <hi rendition=\"#r\"> <hi rendition=\"#smcap\"> <hi rendition=\"#wide\"><term xml:lang=\"la\">Homo</term></hi> </hi>. <term xml:lang=\"la\">Erectus</term>, <term xml:lang=\"la\">bimanus</term>. Mentoprominulo. Dentibus aequaliter approximatis; incisoribus inferioribus erectis.</hi> </p>");
test.append("bbbbbbbbbbbbbbbbbbbbbb <p rendition=\"#indent-1\">1. Geschl. <hi rendition=\"#r\"> <hi rendition=\"#smcap\"> <hi rendition=\"#wide\"><term xml:lang=\"la\">sHomo</term></hi> </hi>. <term xml:lang=\"la\">sErectus</term>, <term xml:lang=\"la\">sbimanus</term>. Mentoprominulo. Dentibus aequaliter approximatis; incisoribus inferioribus erectis.</hi> </p>");
Pattern pattern = Pattern.compile("<p rendition=\"#indent-1\">\\d+\\.\\s*.*?</p>",
Pattern.CASE_INSENSITIVE);
Matcher regexMatcher = pattern.matcher(test.toString());
System.out.println(test);
test.delete(0, test.length());
while (regexMatcher.find()) {
// test.delete(regexMatcher.start(),test.length());
String matched =regexMatcher.group(0);
Pattern termPatter=Pattern.compile("(<term xml:lang=\".*?\")(>)(.*?)(</term>)");
Matcher termMatcher = termPatter.matcher(matched);
if(termMatcher != null){
//termMatcher.start();
System.out.println(termMatcher.groupCount());
while (termMatcher.find()) {
System.out.println("0---"+termMatcher.group(0));
System.out.println(termMatcher.group(1));
System.out.println(termMatcher.group(2));
System.out.println(termMatcher.group(3));
System.out.println(termMatcher.group(4));
termMatcher.appendReplacement(test, appendSortKey(termMatcher.group(0),termMatcher.group(1),termMatcher.group(2),termMatcher.group(3),termMatcher.group(4)));
}
termMatcher.appendTail(test);
}
//regexMatcher.appendTail(test);
}
System.out.println(test);
}
private static String appendSortKey(String totStr, String termStart, String termStartEndTag, String termValue, String termEndTag) {
// TODO Auto-generated method stub
if(totStr!=null){
termStart = termStart+" "+"sortKey=\""+termValue+"\""+termStartEndTag;
return termStart+termValue+termEndTag;
}
return null;
}
}
trying to manipulate only < term>.....< /term> by getting content from matcher of another regular expression(as it is condition) but loosing the content at beginning and ending, please let me know the mistake i am doing.
The expected output is
abacbsidfslhfadskljfhdskh adsfkjlhdslkfhas lkajdsfhak dsjfhs akhasdf adsjkfh asldjkfhds glakdshgf dghkads ghklgh asdflkghadfkl <p rendition="#indent-1">1. Geschl. <hi rendition="#r"> <hi rendition="#smcap"> <hi rendition="#wide"><term xml:lang="la" sortKey="Homo">Homo</term></hi> </hi>. <term xml:lang="la" sortKey="Erectus">Erectus</term>, <term xml:lang="la"sortKey="bimanus" >bimanus</term>. Mentoprominulo. Dentibus aequaliter approximatis; incisoribus inferioribus erectis.</hi> </p>bbbbbbbbbbbbbbbbbbbbbb <p rendition="#indent-1">1. Geschl. <hi rendition="#r"> <hi rendition="#smcap"> <hi rendition="#wide">><term xml:lang="la" sortKey="sHomo">sHomo</term></hi> </hi>. <term xml:lang="la" sortKey="sErectus">sErectus</term>, <term xml:lang="la"sortKey="sbimanus" >sbimanus</term>. Mentoprominulo. Dentibus aequaliter approximatis; incisoribus inferioribus erectis.</hi> </p>
Upvotes: 0
Views: 186
Reputation: 3802
you change the following line of the code, and it will give you the correct result.
Pattern pattern = Pattern.compile(".*<p rendition=\"#indent-1\">\\d+\\.\\s*.*?</p>",
Pattern.CASE_INSENSITIVE);
/*
instead of the following
Pattern pattern = Pattern.compile("<p rendition=\"#indent-1\">\\d+\\.\\s*.*?</p>",
Pattern.CASE_INSENSITIVE);
*/
Explanation:
<p rendition=\"#indent-1\">\\d+\\.\\s*.*?</p>
part is matching <p ...> ... </p>
part, so the appendReplacement is only appending the <p ...> ... </p>
part with replace..*<p rendition=\"#indent-1\">\\d+\\.\\s*.*?</p>
part will match Text <p ...> ... </p>
, so after appendReplacement, you will get Text <p ...> ... </p>
with replacements. Thus the output will be the whole string with <term xml:lang="la">text</term>
being replaced with <term xml:lang="la" sortKey="text">text</term>
Upvotes: 1