Skoua
Skoua

Reputation: 3603

Swift XML parser randomly remove white spaces

I'm parsing a big XML file with french text in Swift and there are some white spaces randomly removed when I retrieve nodes values from it.

Here's the file (open-source, more than 5mb): https://svn.code.sf.net/p/javacrim/code/littre/xml/a.xml

Randomly, white spaces are removed just before accentuated characters. For instance, here's a line from the XML:

<dictScrap>Entre un substantif et un verbe. L'exhortation <oVar>à</oVar> combattre. L'encouragement <oVar>à</oVar> bien vivre. La disposition <oVar>à</oVar> plaisanter. La promptitude <oVar>à</oVar> faire. L'habileté <oVar>à</oVar> parler. La facilité <oVar>à</oVar> comprendre. La répugnance <oVar>à</oVar> venir. Le plaisir <oVar>à</oVar> obéir. La fermeté <oVar>à</oVar> soutenir la vérité. La honte <oVar>à</oVar> mentir.</dictScrap>

And here's the line after parsing:

Optional("Entre un substantif et un verbe. L\'exhortationà combattre. L\'encouragement à bien vivre. La disposition à plaisanter. La promptitude à faire. L\'habileté à parler. La facilité à comprendre. La répugnance à venir. Le plaisir à obéir. La fermeté à soutenir la vérité. La honte à mentir.")

Notice that white space before the first à disappeared, but not the others.

I removed all the <oVar> with search-replace with my editor, didn't need those and got lazy trying to do it with code.

I'm using AEXML for parsing: https://github.com/tadija/AEXML

Here's my code, which doesn't do anything but taking the string and printing it:

if sense["dictScrap"].count > 0 {
    senseEntity.value = sense["dictScrap"].value
}

println(senseEntity.value)

Thanks for your help!

Upvotes: 2

Views: 1300

Answers (3)

Leszek Szary
Leszek Szary

Reputation: 10336

I had a similar problem with the newest version of the library. What fixed the problem for me was creating AEXMLDocument with shouldTrimWhitespace option set to false:

var options = AEXMLOptions()
options.parserSettings.shouldTrimWhitespace = false
let xml = try? AEXMLDocument(xml: response, options: options)

Upvotes: 1

tadija
tadija

Reputation: 3261

I did some testing with your example and you're right with the provided fix.

This happens because of the NSXMLParser behaviour which I obviously didn't encounter with my XML data while creating AEXML (didn't have characters like 'à', which cause parser(_:foundCharacters:) to be called multiple times).

So, this fix is included in AEXML now, thanks for the feedback!

Upvotes: 2

Skoua
Skoua

Reputation: 3603

So I had look into AEXML source code and I found this function which is responsible for the issue:

func parser(parser: NSXMLParser, foundCharacters string: String) {
    currentValue += string.stringByTrimmingCharactersInSet(NSCharacterSet.whitespaceAndNewlineCharacterSet())
    currentElement?.value = currentValue
}

When I remove .stringByTramming... white spaces aren't removed anymore.

If I then use .stringByTrimmingCharactersInSet(NSCharacterSet.whitespaceAndNewlineCharacterSet()) on the parsed value, final strings look like I wanted them to.

Thanks all!

Upvotes: 1

Related Questions