towi
towi

Reputation: 22317

How to keep the whitespaces when reading an attribute value with libxml2

I use libxml2 for parsing my XML configuration file. The newest feature request involves the "correct handling of meaningful whitespaces", e.g. newlines should be kept.

Currently I get the attribute values with xmlGetProp.

I know that usually the whitespaces are normalized by the XML parser -- as the standard requests it (replacing all whitespaces with space char, fusing multiple space chars, stripping leading and trailing space chars).

I wonder if there is a way how I can make sure the embedded newlines in the attributes are kept.

Upvotes: 0

Views: 1791

Answers (2)

Édouard Lopez
Édouard Lopez

Reputation: 43421

Did you try the xml:space attribute or the xmlNodeGetSpacePreserve() :

<para xml:space="preserve">

See :

  1. xmlNodeGetSpacePreserve() @ LibXML documentation ;
  2. XML to preserve the whitespace ;
  3. White Space @ MSDN.

Upvotes: 1

David Carlisle
David Carlisle

Reputation: 5652

As you note this is required by the XML spec, so there is no way in DTD or Schema to stop the normalisation.

You can probably use libxml's html parser though, using its command-line xmllint utility with an input file of

<a>
<b x="1
2
3"/>
</a>

I get

$ xmllint abc.xml
<?xml version="1.0"?>
<a>
<b x="1 2 3"/>
</a>

so the newlines have gone, but:

$ xmllint --html abc.xml
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><a>
<b x="1
2
3"></b>
</a></body></html>

Newlines kept (spurious inferred html and body added but you could lose them post parsing in your application).

Upvotes: 1

Related Questions