varkashy
varkashy

Reputation: 385

How to effectively remove immediate tags in XML String in Java

I have a XML being parsed as string.. the basic structure is something as below

 <envelope>
    <body>
        <entity1>
                <tag1>
                 .
                 .
                </tag 1>
                <tag2>
                 .
                 .
               </tag2>
        </entity1>
        <entity 2>
               <tag1>
                 .
                 .
               </tag1>
               <tag2>
                 .
                 .
               </tag2>
        </entity2>

I need to remove the tags lets say tag2 i.e. the whole .. block. I am doing this using a while loop, something like

 while(str.indexOf("<tag2>")>=0)
    {
       strRepl=str.substring(str.indexOf("<tag2>"),str.indexOf("</tag2>")+7);
       str=xmlString.replaceFirst(strRepl,"");
    } 

This is working but i wanted to understand if there is a better way to implement this using strings? Please suggest.

Upvotes: 0

Views: 3260

Answers (1)

Runcorn
Runcorn

Reputation: 5224

You could use Regex for that. Java provides Pattern and Matcher class that could do the job for you.

    String yourString = "<envelope><body><entity1></entity1></body></envelope>";
    String REGULAR_EXPRESSION= "(\\<body>.+?\\</body>)";
    Pattern pattern = Pattern.compile(REGULAR_EXPRESSION, Pattern.DOTALL | Pattern.MULTILINE);
    Matcher matcher = pattern.matcher(hello);
    if (matcher.find()) {
       System.out.println(yourString.replaceAll(matcher.group(1), ""));
    }

Here (\\<body>.+?\\</body>) represents all the content enclosed inside <body> tag inclusive of tag. The line matcher.group(1)represent the position of the matched string.

if you want to replace all the occurrence simply use

    yourString = matcher.replaceAll("");

And to replace first occurrence only use :

    yourString = matcher.replaceFirst("");

And i am not sure whether it will work for the new line "\n" content in the string, if the string is of single line , you can use :

System.out.println(yourString.replaceAll(REGULAR_EXPRESSION, ""));

Upvotes: 1

Related Questions