Reputation: 2374
I have a complex long XHTML file, which contains CSS. Searching on google and on this site, I've found some libraries that can be useful on XHTML parsing:
However, I'm wondering if there is any library for iPhone that can convert a xhtml + css document to a NSAttributedString
(only the text, of course).
I have been thinking on that problem, and I have had some ideas, but I think it won't be very efficient. My main idea is formed by this steps:
id
or class
attribute and get the range of the string where they have effect (I cannot achieve this).Save all the CSS attributes on a NSDictionary
, with more NSDictionary
objects inside. Something like this:
mainDict {
object: dictionary {
object: @"#00ff00"
key: @"color"
object: @"1em"
key: @"font-size"
}
key: @"a id"
object: anotherDictionary {
...
}
key: @"another id"
}
Convert these CSS attributes dictionary on the NSAttributedString
attributes dictionary.
I know that this is complex, and I don't need you to provide the code (of course, if you provide it, it would be great), I only want the link to a library or, if it doesn't exist, some advice for create a parser myself.
Of course, if you need some more information, ask by comments.
Thanks you!!
Upvotes: 0
Views: 890
Reputation: 433
My way to parse an HTML string into NSAttributedString is to recursively append parsed node (and its childNodes) into an NSMutableAttributedString.
I am not ready to publish my full code anywhere yet. But hopefully this can give you some hints...
NSString+HTML.h
/* - toHTMLElements
* parse the string itself into a dictionary collection of htmlelements for following keys
* : @"attributedString" // html main body
* : @"insets" // images and/or videos with range info
* : @"as" // href with range info
*
*/
- (NSMutableDictionary*) toHTMLElements;
NSString+HTML.m
- (NSMutableDictionary*) toHTMLElements {
// …
// handle escape encoding here
// assume that NSString* htmlString is the processed string;
// …
NSMutableDictionary * htmlElements = [[NSMutableDictionary dictionary] retain];
NSMutableAttributedString * attributedString = [[[NSMutableAttributedString alloc] init] autorelease];
NSMutableArray * insets = [NSMutableArray array];
NSMutableArray * as = [NSMutableArray array];
[htmlElements setObject:attributedString forKey:HTML_ATTRIBUTEDSTRING];
[htmlElements setObject:insets forKey:HTML_INSETS];
[htmlElements setObject:as forKey:HTML_AS];
// parse the HTML with an XML parser
// CXXML is a variance of TBXML (http://www.tbxml.co.uk/ ) which can handle the inline tags such as <span>
// code not available to public yet, so write your own inline-tag-enabled HTML/XML parser.
CXXML * xml = [CXXML tbxmlWithXMLString:htmlString];
TBXMLElement * root = xml.rootXMLElement;
TBXMLElement * next = root->firstChild;
while (next != nil) {
//
// do something here for special treatments if needed
//
NSString * tagName = [CXXML elementName:next];
[self appendXMLElement:next withAttributes:[HTMLElementAttributes defaultAttributesFor:tagName] toHTMLElements:htmlElements];
next = next->nextSibling;
}
return [htmlElements autorelease];
}
- (void) appendXMLElement:(TBXMLElement*)aElement withAttributes:(NSDictionary*)parentAttributes toHTMLElements:(NSMutableDictionary*) htmlElements {
// do your parse of aElement and its attribute values,
// assume NSString * tagAttrString is the parsed html attribute string (either from "style" attribute or css file) for this tag like : width:200px; color:#123456;
// let an external HTMLElementAttributes class to handle the attribute updates from the parent node's attributes
NSDictionary * tagAttr = [HTMLElementAttributes updateAttributes: parentAttributes withCSSAttributes:tagAttrString];
// create your NSAttributedString styled by tagAttr
// create insets such as images / videos or hyper links objects
// then update the htmlElements for storage
// once this tag is handled, recursively visit and process the current tag's children
TBXMLElement * nextChild = aElement->firstChild;
while (nextChild != nil) {
[self appendXMLElement:nextChild withAttributes:tagAttr toHTMLElements:htmlElements];
nextChild = nextChild->nextSibling;
}
}
Upvotes: 1
Reputation: 6715
It depends on your needs if this will do what you want, but DTCoreText has an HTML -> NSAttributedString converter. It's very specific for what DTCoreText wants to / needs to do, but it might at least point you in the right direction.
Upvotes: 2