Reputation: 3147
I have a String which is XML data. After removing some nodes and adding few. The xml data is having lot of white spaces in it (created during the node removal.)
<A>
<B>
</B>
<!-- some node i deleted and lot of white spaces -->
<c>
</c>
<!-- some more node i deleted and lot of white spaces -->
<E>
</E>
Desired output after String manipulation
<A>
<B>
</B>
<c>
</c>
<E>
</E>
</A>
I can use replaceAll("\s","") but this removes even the new line character and make the xml out of structure for displaying it in UI.
Is there a way to trim it without trimming the new line character?
Edit: This XML data is part of OMElement
Upvotes: 3
Views: 3653
Reputation: 3147
There is a costly way of doing this.
Scanner scanner = new Scanner(str);
StringBuffer strBuff = new StringBuffer();
while(scanner.hasNextLine()){
String line = scanner.nextLine();
if(line.length() > 0 && !line.trim().equals("")){
strBuff.append("\n");
strBuff.append(line);
}
}
Eventually when the loop ends we can remove the empty lines from the xml and xml will be well formed. As you can see this is not ideal for large xml since lot of xml string objects are created internally.
Regards
Dheeraj Joshi
Upvotes: 0
Reputation: 19185
If you are using DocumentBuilder to modify XML then you can also make use of below method.
DocumentBuilderFactory.setIgnoringElementContentWhitespace
Specifies that the parsers created by this factory must eliminate whitespace in element content (sometimes known loosely as 'ignorable whitespace')
factory.setValidating(true);
factory.setIgnoringElementContentWhitespace(true);
Upvotes: 1
Reputation: 29656
Can you clarify what you mean? If you mean whitespace other than new-lines, try as follows.
str = str.replaceAll("[ \t\x0B\f\r]", "");
... or, do you instead mean you want to remove extraneous new lines?
str = str.replaceAll("\n{2,}", "\n");
... or do you only want to remove only literal ' '
spaces?
str = str.replace(" ", "");
Upvotes: 3
Reputation: 6844
I suggest to use regex str.replaceAll("(</[^>]+>)\\s+(<[^>]+>)","$1\n$2")
which detects the spaces between tags and removes them. It lefts only single end of line
Upvotes: 2
Reputation: 4971
try to use someString.replaceAll("\\u0020","")
This String is the endocing of whitespaces and should do the job
edited: if you need other take a look at this question. you will find others in the answer of tchrist.
Upvotes: 2