Reputation: 1980
There is a bug in this C++ code. It replaces multiple whitespaces between words by one space. Can't figure out where it is. It shouldn't trim whitespaces between two words and replace them by one. This is the method which deals with the whitespaces and blanks.
const char* TiXmlBase::SkipWhiteSpace( const char* p, TiXmlEncoding encoding )
{
if ( !p || !*p )
{
return 0;
}
if ( encoding == TIXML_ENCODING_UTF8 )
{
while ( *p )
{
const unsigned char* pU = (const unsigned char*)p;
if ( *(pU+0)==TIXML_UTF_LEAD_0
&& *(pU+1)==TIXML_UTF_LEAD_1
&& *(pU+2)==TIXML_UTF_LEAD_2 )
{
p += 3;
continue;
}
else if(*(pU+0)==TIXML_UTF_LEAD_0
&& *(pU+1)==0xbfU
&& *(pU+2)==0xbeU )
{
p += 3;
continue;
}
else if(*(pU+0)==TIXML_UTF_LEAD_0
&& *(pU+1)==0xbfU
&& *(pU+2)==0xbfU )
{
p += 3;
continue;
}
if ( IsWhiteSpace( *p ) ) // Still using old rules for white space.
p++;
else
break;
}
}
else
{
while ( *p && IsWhiteSpace( *p ) )
// while(*p)
++p;
}
return p;
}
Input:
<?xml version="1.0" standalone="no" ?>
<ToDo>
<bold>Toy store!</bold>
</ToDo>
Expected output:
<?xml version="1.0" standalone="no" ?>
<ToDo>
<bold>Toy store!</bold>
</ToDo>
Observed output:
<?xml version="1.0" standalone="no" ?>
<ToDo>
<bold>Toy store!</bold>
</ToDo>
Upvotes: 3
Views: 2570
Reputation: 1428
Try setting bool TiXmlBase::condenseWhiteSpace
to false
in the file tinyxml.cpp
, or calling TiXmlBase::SetCondenseWhiteSpace(false)
in runtime. The first worked for me.
This probably didn't exist in 2012, but it exists now.
Upvotes: 0
Reputation: 392921
Switch to TinyXML-2:
Advantages of TinyXML-2
- The focus of all future dev.
- Many fewer memory allocation (1/10th to 1/100th), uses less memory (about 40% of TinyXML-1), and faster.
- No STL requirement.
- More modern C++, including a proper namespace.
- Proper and useful handling of whitespace
Microsoft has an excellent article on white space: http://msdn.microsoft.com/en-us/library/ms256097.aspx
TinyXML-2 preserves white space in a (hopefully) sane way that is almost complient with the spec.(TinyXML-1 used a completely outdated model.)
As a first step, all newlines / carriage-returns / line-feeds are normalized to a line-feed character, as required by the XML spec.
White space in text is preserved. For example:
<element> Hello, World</element>
The leading space before the "Hello" and the double space after the comma are preserved. Line-feeds are preserved, as in this example:
<element> Hello again, World</element>
However, white space between elements is not preserved. Although not strictly compliant, tracking and reporting inter-element space is awkward, and not normally valuable. TinyXML-2 sees these as the same XML:
<document> <data>1</data> <data>2</data> <data>3</data> </document> <document><data>1</data><data>2</data><data>3</data></document>
Upvotes: 5