FoamyGuy
FoamyGuy

Reputation: 46856

Parsing XML dropping all characters after &

I am creating an app that parses some XML and display it in a ListView. A few items in my xml contain &s so I have escaped them like this & It is working correctly on a few devices and on the emulator.

But on two devices (Samsung Sidekick 4g API 2.2, and Samsung Replish API 2.3.6) it is failing. Everything after the & gets magically disappeared.

Here is the item in the XML giving me trouble:

<site>
    <name>English Language &amp; Usage</name>
    <link>http://english.stackexchange.com/</link>
    <about>English Language &amp; Usage Stack Exchange is a question and answer site for linguists, etymologists, and serious English language enthusiasts.</about>
    <image>https://dl.dropboxusercontent.com/u/5724095/XmlParseExample/english.png</image>
</site>

Here is the "meat" of the parsing code:

private static String getValue(Element item, String str) {
    NodeList n = item.getElementsByTagName(str);
    Log.i("StackSites", ""+getElementValue(n.item(0)));
    return getElementValue(n.item(0));
}

private static String getElementValue( Node elem ) {
         Node child;
         if( elem != null){
             if (elem.hasChildNodes()){
                 for( child = elem.getFirstChild(); child != null; child = child.getNextSibling() ){
                     if( child.getNodeType() == Node.TEXT_NODE  ){
                         return child.getNodeValue();
                     }
                 }
             }
         }
         return "";
  }

On some devices (LG Optimus G, Moto Attrix 2, and a few emulators) this works correctly and comes out like this:

enter image description here

However on the two Samsung devices that I've tried getValue() method returns only the text that comes before the &amp; so the result is this:

enter image description here

Upvotes: 2

Views: 166

Answers (3)

FoamyGuy
FoamyGuy

Reputation: 46856

CommonsWare got me pointed in the right direction.

I changed the getElementValue() method to this:

private static String getElementValue( Node elem ) {
     StringBuilder value = new StringBuilder();
     Node child;
     if( elem != null){
         if (elem.hasChildNodes()){
             for( child = elem.getFirstChild(); child != null; child = child.getNextSibling() ){
                 if( child.getNodeType() == Node.TEXT_NODE  ){
                     value.append(child.getNodeValue());

                 }
             }
             return value.toString();
         }
     }
     return "";
  } 

and it gets the second half of the text correctly now.

Upvotes: 2

Ted Hopp
Ted Hopp

Reputation: 234847

This is a known bug on some Android releases. It was fixed in Honeycomb (3.0).

There's no good work-around. You need to process the text as [text node] [entity node] [text node], interpret the entity reference yourself, and concatenate the results.

Alternatively, you can avoid the use of XML character reference and substitute your own escape sequences. As long as the parser doesn't see a &, the problem is avoided.

Upvotes: 3

CommonsWare
CommonsWare

Reputation: 1007474

That's because you aren't looking at the rest of the nodes. The entity gets a different node, and the text following the entity gets the node after that. You are returning immediately -- you need to concatenate your results.

Upvotes: 3

Related Questions