Faz Ya
Faz Ya

Reputation: 1480

extract the text only from html content in objective C

All the strip functions that I found were extracting the html elements from an html content. I am looking for a simple objective c function that given a nested block of text like:

<table border="0" cellpadding="2" cellspacing="7" style="vertical-align:top;"><tr><td width="80" align="center" valign="top"><font style="font-size:85%;font-family:arial,sans-serif"></font></td><td valign="top" class="j"><font style="font-size:85%;font-family:arial,sans-serif"><br /><div style="padding-top:0.8em;"><img alt="" height="1" width="1" /></div><div class="lh"><a href="http://news.google.com/news/url?sa=t&amp;fd=R&amp;usg=AFQjCNFV5azq03nECHSmTV0CI-KwzBFXWA&amp;url=http://www.fool.com/investing/general/2012/03/11/the-justice-department-has-apples-number.aspx"><b>The Justice Department Has ordered <b>Apple</b> .... 

It will only return The justice Department has ordered apple ....

I know there is a UIWebView Javascript function that does it but it seems a little slow cause it relies on javascript. I was wondering if there is function that given the html with nested tags (it will ignore all the tags and their content and returns a plain content text)

Thanks, Ross

Upvotes: 0

Views: 2485

Answers (1)

yuji
yuji

Reputation: 16725

Just split the string using angle brackets, take every other element, and join them back together:

NSArray *components = [yourString componentsSeparatedByCharactersInSet:[NSCharacterSet characterSetWithCharactersInString:@"<>"]];

NSMutableArray *componentsToKeep = [NSMutableArray array];
for (int i = 0; i < [components count]; i = i + 2) {
    [componentsToKeep addObject:[components objectAtIndex:i]];
}

NSString *plainText = [componentsToKeep componentsJoinedByString:@""];

Upvotes: 3

Related Questions