user1222760
user1222760

Reputation: 79

Android - xml ampersand conversion

I have a sax parser with an xml tag that contains the following text: "A & amp; B" (There's no space there - added so it will not convert to & here)

It's as though it's getting converted twice and escaping due to ampersand with a result of "A". Here's the process:

Xml file is downloaded

InputStream _inputStream = _urlConnection.getInputStream();
                        BufferedInputStream _bufferedInputStream = new BufferedInputStream(_inputStream);
                        ByteArrayBuffer _byteArrayBuffer = new ByteArrayBuffer(64);

                        int current = 0;
                        while((current = _bufferedInputStream.read()) != -1)
                        {
                            _byteArrayBuffer.append((byte)current);
                        }

                        FileOutputStream _fileOutputStream = openFileOutput(_file, MODE_PRIVATE);

                        _fileOutputStream.write(_byteArrayBuffer.toByteArray());
                        _fileOutputStream.close();

Data is converted with Sax in the endElement

else if (inLocalName.equalsIgnoreCase(_nodeTitle))
        {
            _titleValue = currentValue;
            currentValue = "";
        }

In debug, the ampersand is already converted and the data truncated when I read it in my characters method in the handler.

I've seen a lot of questions about this but never a solution. Any ideas?

Thanks

Parser:

List<PropertiesList> _theList = null;

        try 
        {
            // Create Factory, Parser, Reader, Handler
            SAXParserFactory _saxParserFactory = SAXParserFactory.newInstance();
            SAXParser _saxParser = _saxParserFactory.newSAXParser();
            XMLReader _xmlReader = _saxParser.getXMLReader();
            HandlerReps _handler = new HandlerReps(inRegion, inAbbreviation);

            _xmlReader.setContentHandler(_handler);
            _xmlReader.parse(new InputSource(inStream));

            _theList = _handler.getTheList();
        } 

Handler:

// Called when Tag Begins
    @Override
    public void startElement(String uri, String inLocalName, String inQName, Attributes inAttributes) throws SAXException 
    {
        currentElement = false;
    }

    // Called when Tag Ends
    @Override
    public void endElement(String inUri, String inLocalName, String inQName) throws SAXException 
    {
        currentElement = false;

        // Title
        if (inLocalName.equalsIgnoreCase(_nodeValue))
        {
            if (_stateValue.equalsIgnoreCase(_abbreviation) && 
                _countryValue.equalsIgnoreCase(_region))
            {
                // Construct the object
                PropertiesRegion _regionObject = new PropertiesRegion(_titleValue, _address1Value);

                cList.add(_regionObject);

                Log.d(TAG, _regionObject.toString());
            }

            _titleValue = "";
            _address1Value = "";
        }

        // Title
        else if (inLocalName.equalsIgnoreCase(_nodeTitle))
        {
            _titleValue = currentValue;
            currentValue = "";
        }

        // Address1
        else if (inLocalName.equalsIgnoreCase(_nodeAddress1))
        {
            _address1Value = currentValue;
            currentValue = "";
        }
    }

    // Called to get Tag Characters
    @Override
    public void characters(char[] inChar, int inStart, int inLength) throws SAXException 
    {
        if (currentElement) 
        {
            currentValue = new String(inChar, inStart, inLength);
            currentElement = false;
        }
    }

Upvotes: 0

Views: 719

Answers (1)

This is very likely the cause of your problem:

    if (currentElement) 
    {
        currentValue = new String(inChar, inStart, inLength);
        currentElement = false;
    }

For each text content node, the SAX parser may send multiple characters() events to your handler. You only get the whole text if you concatenate all these events. But in your code, only the first of these events is used, because then you set currentElement = false.

The problem is not ampersand conversion. As a general rule, when you describe a problem, it is often better to only describe the symptoms, not any supposed causes.

Upvotes: 1

Related Questions