santosh
santosh

Reputation: 53

Matcher. appendReplacement is not adding starting content

public class TestUtil {


    public static void main(String[] args) {
        StringBuffer test = new StringBuffer(); 
        test.append("abacbsidfslhfadskljfhdskh adsfkjlhdslkfhas lkajdsfhak dsjfhs akhasdf adsjkfh asldjkfhds glakdshgf dghkads ghklgh asdflkghadfkl <p rendition=\"#indent-1\">1. Geschl. <hi rendition=\"#r\"> <hi rendition=\"#smcap\"> <hi rendition=\"#wide\"><term xml:lang=\"la\">Homo</term></hi> </hi>. <term xml:lang=\"la\">Erectus</term>, <term xml:lang=\"la\">bimanus</term>. Mentoprominulo. Dentibus aequaliter approximatis; incisoribus inferioribus erectis.</hi> </p>");
        test.append("bbbbbbbbbbbbbbbbbbbbbb <p rendition=\"#indent-1\">1. Geschl. <hi rendition=\"#r\"> <hi rendition=\"#smcap\"> <hi rendition=\"#wide\"><term xml:lang=\"la\">sHomo</term></hi> </hi>. <term xml:lang=\"la\">sErectus</term>, <term xml:lang=\"la\">sbimanus</term>. Mentoprominulo. Dentibus aequaliter approximatis; incisoribus inferioribus erectis.</hi> </p>");
        Pattern pattern = Pattern.compile("<p rendition=\"#indent-1\">\\d+\\.\\s*.*?</p>",
                    Pattern.CASE_INSENSITIVE);
        Matcher regexMatcher = pattern.matcher(test.toString());
        System.out.println(test);
        test.delete(0, test.length());
        while (regexMatcher.find()) {

            //  test.delete(regexMatcher.start(),test.length());
                String matched =regexMatcher.group(0);
                Pattern termPatter=Pattern.compile("(<term xml:lang=\".*?\")(>)(.*?)(</term>)");

                Matcher termMatcher = termPatter.matcher(matched);

                if(termMatcher != null){
                    //termMatcher.start();
                    System.out.println(termMatcher.groupCount());
                    while (termMatcher.find()) {
                        System.out.println("0---"+termMatcher.group(0));
                        System.out.println(termMatcher.group(1));
                        System.out.println(termMatcher.group(2));
                        System.out.println(termMatcher.group(3));
                        System.out.println(termMatcher.group(4));

                        termMatcher.appendReplacement(test, appendSortKey(termMatcher.group(0),termMatcher.group(1),termMatcher.group(2),termMatcher.group(3),termMatcher.group(4)));

                    }
                    termMatcher.appendTail(test);
                }
                //regexMatcher.appendTail(test);
        }
        System.out.println(test);
    }

    private static String appendSortKey(String totStr, String termStart, String termStartEndTag, String termValue, String termEndTag) {
        // TODO Auto-generated method stub
        if(totStr!=null){
            termStart = termStart+" "+"sortKey=\""+termValue+"\""+termStartEndTag;
            return termStart+termValue+termEndTag;
        }

        return null;
    }
}

trying to manipulate only < term>.....< /term> by getting content from matcher of another regular expression(as it is condition) but loosing the content at beginning and ending, please let me know the mistake i am doing.

The expected output is

abacbsidfslhfadskljfhdskh adsfkjlhdslkfhas lkajdsfhak dsjfhs akhasdf adsjkfh asldjkfhds glakdshgf dghkads ghklgh asdflkghadfkl <p rendition="#indent-1">1. Geschl. <hi rendition="#r"> <hi rendition="#smcap"> <hi rendition="#wide"><term xml:lang="la" sortKey="Homo">Homo</term></hi> </hi>. <term xml:lang="la" sortKey="Erectus">Erectus</term>, <term xml:lang="la"sortKey="bimanus" >bimanus</term>. Mentoprominulo. Dentibus aequaliter approximatis; incisoribus inferioribus erectis.</hi> </p>bbbbbbbbbbbbbbbbbbbbbb <p rendition="#indent-1">1. Geschl. <hi rendition="#r"> <hi rendition="#smcap"> <hi rendition="#wide">><term xml:lang="la" sortKey="sHomo">sHomo</term></hi> </hi>. <term xml:lang="la" sortKey="sErectus">sErectus</term>, <term xml:lang="la"sortKey="sbimanus" >sbimanus</term>. Mentoprominulo. Dentibus aequaliter approximatis; incisoribus inferioribus erectis.</hi> </p>

Upvotes: 0

Views: 186

Answers (1)

Denim Datta
Denim Datta

Reputation: 3802

you change the following line of the code, and it will give you the correct result.

Pattern pattern = Pattern.compile(".*<p rendition=\"#indent-1\">\\d+\\.\\s*.*?</p>",
                Pattern.CASE_INSENSITIVE);
/*
instead of the following
Pattern pattern = Pattern.compile("<p rendition=\"#indent-1\">\\d+\\.\\s*.*?</p>",
                Pattern.CASE_INSENSITIVE);
*/

Explanation:

  • <p rendition=\"#indent-1\">\\d+\\.\\s*.*?</p> part is matching <p ...> ... </p> part, so the appendReplacement is only appending the <p ...> ... </p> part with replace.
  • .*<p rendition=\"#indent-1\">\\d+\\.\\s*.*?</p> part will match Text <p ...> ... </p>, so after appendReplacement, you will get Text <p ...> ... </p> with replacements.

Thus the output will be the whole string with <term xml:lang="la">text</term> being replaced with <term xml:lang="la" sortKey="text">text</term>

Upvotes: 1

Related Questions