Reputation: 3603
I'm parsing a big XML file with french text in Swift and there are some white spaces randomly removed when I retrieve nodes values from it.
Here's the file (open-source, more than 5mb): https://svn.code.sf.net/p/javacrim/code/littre/xml/a.xml
Randomly, white spaces are removed just before accentuated characters. For instance, here's a line from the XML:
<dictScrap>Entre un substantif et un verbe. L'exhortation <oVar>à</oVar> combattre. L'encouragement <oVar>à</oVar> bien vivre. La disposition <oVar>à</oVar> plaisanter. La promptitude <oVar>à</oVar> faire. L'habileté <oVar>à</oVar> parler. La facilité <oVar>à</oVar> comprendre. La répugnance <oVar>à</oVar> venir. Le plaisir <oVar>à</oVar> obéir. La fermeté <oVar>à</oVar> soutenir la vérité. La honte <oVar>à</oVar> mentir.</dictScrap>
And here's the line after parsing:
Optional("Entre un substantif et un verbe. L\'exhortationà combattre. L\'encouragement à bien vivre. La disposition à plaisanter. La promptitude à faire. L\'habileté à parler. La facilité à comprendre. La répugnance à venir. Le plaisir à obéir. La fermeté à soutenir la vérité. La honte à mentir.")
Notice that white space before the first à disappeared, but not the others.
I removed all the <oVar>
with search-replace with my editor, didn't need those and got lazy trying to do it with code.
I'm using AEXML for parsing: https://github.com/tadija/AEXML
Here's my code, which doesn't do anything but taking the string and printing it:
if sense["dictScrap"].count > 0 {
senseEntity.value = sense["dictScrap"].value
}
println(senseEntity.value)
Thanks for your help!
Upvotes: 2
Views: 1300
Reputation: 10336
I had a similar problem with the newest version of the library. What fixed the problem for me was creating AEXMLDocument with shouldTrimWhitespace option set to false:
var options = AEXMLOptions()
options.parserSettings.shouldTrimWhitespace = false
let xml = try? AEXMLDocument(xml: response, options: options)
Upvotes: 1
Reputation: 3261
I did some testing with your example and you're right with the provided fix.
This happens because of the NSXMLParser
behaviour which I obviously didn't encounter
with my XML data while creating AEXML (didn't have characters like 'à', which cause parser(_:foundCharacters:)
to be called multiple times).
So, this fix is included in AEXML now, thanks for the feedback!
Upvotes: 2
Reputation: 3603
So I had look into AEXML source code and I found this function which is responsible for the issue:
func parser(parser: NSXMLParser, foundCharacters string: String) {
currentValue += string.stringByTrimmingCharactersInSet(NSCharacterSet.whitespaceAndNewlineCharacterSet())
currentElement?.value = currentValue
}
When I remove .stringByTramming...
white spaces aren't removed anymore.
If I then use .stringByTrimmingCharactersInSet(NSCharacterSet.whitespaceAndNewlineCharacterSet())
on the parsed value, final strings look like I wanted them to.
Thanks all!
Upvotes: 1