Reputation: 3
I am using NSXMLParser to parse an xml for url. Some of the elements contains special characters in text and also italics.
<name>Verify Settings<i>i</i>patch level</name>
NSXMLParser breaks the text and gives Output: Verify Settings
Is there any way to parse italics text in between elements?
<impact> In 2003, the ¿shared APPL_TOP¿ architecture was introduced, which allowed the sharing of a single APPL_TOP, however the tech stack · Reduced disk space requirements · Reduced maintenance · Reduced administrative costs · Reduced patching down time · Less complex to add additional nodes, making scalability easier · Complexity of instance reduced · Easier backups · Easier cloning</impact>
It breaks the text and gives Output: e costs ·Reduced patching down time ·Less complex to add additional nodes, making scalability easier ·Complexity of instance reduced ·Easier backups ·Easier cloning
Any suggestions on how to parse italic tags in the text and special characters using NSXMLParser ?
Here is my foundCharacters
code:
- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string
{
if (!self.currentStringValue) {
// currentStringValue is an NSMutableString instance variable
self.currentStringValue = [[NSMutableString alloc] init];
}
[self.currentStringValue appendString:string];
}
Upvotes: 0
Views: 387
Reputation: 437592
Both of these look less like XML parsing problems than XML generation problems. How are you generating this XML? It feels like a manually generated XML, as opposed to something generated by a proper XML library.
Look at your XML from the parser's perspective: How is NSXMLParser
supposed to know that the <i>
is HTML in the <name>
element, and not a new XML tag itself?!? If this is indeed what the XML looks like, you really should just fix your web service.
For example, looking at your problem with the italics the problem is that the <i>
looks like a new element name. Generally that should be represented either as:
<name>Verify Settings<i>i</i>patch level</name>
Or as
<name><![CDATA[Verify Settings<i>i</i>patch level]]></name>
This encoding of the name
property is generally done by the API that does the XML encoding in the web service. Generally you don't need to do anything to get this behavior. But if your web service is manually creating its own XML, that could give you the sort of output that you describe in your original question.
On the second example, I would have thought that the characters in the XML must conform to the character set outlined in the <?xml ...>
tag, e.g,:
<?xml version="1.0" encoding="ISO-8859-1"?>
What does your <?xml ...>
tag say? Are the characters listed falling within the encoding listed there?
Looking at your revised foundCharacters
, the new rendition is much better. The previous rendition suffered from a problem, insofar as it assumed that foundCharacters
would be called only once for any given pair of <name>
and </name>
tags. That is not necessarily the case. Your latest rendition correctly creates currentStringValue
if it needs to, and then appends to it. That is the correct approach, consistent with the examples in the Apple documentation. You might only want to do that if you're parsing one of the elementName
types that you care about (e.g. <name>
), but with that minor caveat, this new rendition looks much better.
Upvotes: 1