Dheeraj Joshi
Dheeraj Joshi

Reputation: 3147

Remove only white spaces from a String

I have a String which is XML data. After removing some nodes and adding few. The xml data is having lot of white spaces in it (created during the node removal.)

<A>
<B>
</B>

<!-- some node i deleted and lot of white spaces -->



<c>
</c>


<!-- some more node i deleted and lot of white spaces -->




<E>
</E>

Desired output after String manipulation

<A>
<B>
</B>
<c>
</c>
<E>
</E>
</A>

I can use replaceAll("\s","") but this removes even the new line character and make the xml out of structure for displaying it in UI.

Is there a way to trim it without trimming the new line character?

Edit: This XML data is part of OMElement

Upvotes: 3

Views: 3653

Answers (5)

Dheeraj Joshi
Dheeraj Joshi

Reputation: 3147

There is a costly way of doing this.

Scanner scanner = new Scanner(str);
StringBuffer strBuff = new StringBuffer();
while(scanner.hasNextLine()){
       String line = scanner.nextLine();
           if(line.length() > 0 && !line.trim().equals("")){
                 strBuff.append("\n");
         strBuff.append(line);
       }
}

Eventually when the loop ends we can remove the empty lines from the xml and xml will be well formed. As you can see this is not ideal for large xml since lot of xml string objects are created internally.

Regards
Dheeraj Joshi

Upvotes: 0

Amit Deshpande
Amit Deshpande

Reputation: 19185

If you are using DocumentBuilder to modify XML then you can also make use of below method.

DocumentBuilderFactory.setIgnoringElementContentWhitespace

Specifies that the parsers created by this factory must eliminate whitespace in element content (sometimes known loosely as 'ignorable whitespace')

factory.setValidating(true);
factory.setIgnoringElementContentWhitespace(true);

Upvotes: 1

obataku
obataku

Reputation: 29656

Can you clarify what you mean? If you mean whitespace other than new-lines, try as follows.

str = str.replaceAll("[ \t\x0B\f\r]", "");

... or, do you instead mean you want to remove extraneous new lines?

str = str.replaceAll("\n{2,}", "\n");

... or do you only want to remove only literal ' ' spaces?

str = str.replace(" ", "");

Upvotes: 3

Gaim
Gaim

Reputation: 6844

I suggest to use regex str.replaceAll("(</[^>]+>)\\s+(<[^>]+>)","$1\n$2") which detects the spaces between tags and removes them. It lefts only single end of line

Upvotes: 2

Matthias Kricke
Matthias Kricke

Reputation: 4971

try to use someString.replaceAll("\\u0020","") This String is the endocing of whitespaces and should do the job

edited: if you need other take a look at this question. you will find others in the answer of tchrist.

Upvotes: 2

Related Questions