Reputation: 9484
I am trying to parse XML file using NSXMLParser. Everything seems to work fine initially but the content result seems to be truncated off and got some weird result.
func parser(parser: NSXMLParser!, didStartElement elementName: String!, namespaceURI: String!, qualifiedName qName: String!, attributes attributeDict: [NSObject : AnyObject]!) {
if elementName == "title" {
foundTitle = true
}
if elementName == "description" {
foundDescription = true
}
}
func parser(parser: NSXMLParser!, foundCharacters string: String!) {
if (foundItem) {
if foundTitle {
println("Title: \(string)")
foundTitle = false
}
else if foundDescription {
println("Description: \(string)")
foundDescription = false
}
}
}
The RSS feed I am testing on is This Day in Tech History (http://feedpress.me/ThisDayInTechHistory), and right now the first news have the following:
Title: IBM’s First Desktop Computer
Description: IBM introduces their System/23 Datamaster desktop computer...
Bur for my test result, this is what I got:
Title: IBM
Description: ’s First Desktop Computer
Description: July 28, 1981 IBM introduces their System/23 Datamaster desktop computer...
Note that the Title was truncated after the first ' and become a description! Is this a bug in NSXMLParser? Or what have I done wrong? Thanks!
Upvotes: 2
Views: 9646
Reputation: 4494
Lim Thye Chean's answer is correct, but here's the problem in your code:
foundTitle = false
You see, foundCharacters
stops at the first ’
it encounters. Then you set foundTitle = false
. So the remaining part of the string is being ignored when foundCharacters
proceeds to find them (because foundTitle = false
).
The best solution, IMHO, is to use these three delegate methods:
1) In didStartelement
you should set a temporary variable such as var entryTitle = String()
(so we're clearing out this string every time the parser didStartElement "title"
)
2) foundCharacters
is called multiple times, stopping at many "uncommon" characters. We need to append each found string to our temporary variable. So inside foundCharacters
we should say: entryTitle += string
(to append to our variable all the little bits of string the parser finds separately)
3) Only when the parser didEndElement "title"
should we assume that we have the "title" String
completed. So it's here that we should say foundTitle = false
, and also here that you should println(entryTitle)
I hope that helps. I've struggled a lot with the XMLParser, so I've written a short tutorial in understanding how it works: https://medium.com/@lucascerro/understanding-nsxmlparser-in-swift-xcode-6-3-1-7c96ff6c65bc
Upvotes: 2
Reputation: 9484
I found the issue. After getting the element "item", all the contained elements like "title" or "description" can appeared multiple times! So "IBM’s First Desktop Computer" will be split into 2 titles, and we need to combine them into some variables, and only construct the result when the element ends.
So new codes will work like this:
func parser(parser: NSXMLParser!, didStartElement elementName: String!, namespaceURI: String!, qualifiedName qName: String!, attributes attributeDict: [NSObject : AnyObject]!) {
element = elementName
if element == "item" {
isItem = true
titleText = ""
...
}
}
// Get element text
func parser(parser: NSXMLParser!, foundCharacters string: String!) {
if isItem {
if element == "title" {
titleText += string
}
...
}
}
// Construct HTML when element end
func parser(parser: NSXMLParser!, didEndElement elementName: String!, namespaceURI: String!, qualifiedName qName: String!) {
if elementName == "item" {
html += "<b>\(titleText)</b>"
...
}
}
This works!
Upvotes: 1
Reputation: 1657
Your guess is correct! The NSXMLParser assumes that the string has already been escaped, and will run into issues with characters including >
, <
, '
, &
, and \
.
To do a global replace on a string, you can use the NSString
method stringByReplacingOccurrencesOfString
, like so:
let xml = "<desciption>Here's a malformed XML string. Ain't it ugly?</description>"
xml.stringByReplacingOccurrencesOfString("'", withString: """)
Which returns:
"<desciption>Here"s a malformed XML string. Ain"t it ugly?</description>"
Upvotes: 2