Reputation: 99
I am trying to parse: [www.neiu.edu/~neiutemp/PhoneBook/alpha.htm] using the TFHPPLE parser and I am looking for the 1st TD (first column) from every TR (row) in a table. Here All the attributes of the TDs are same. I can't differentiate TDs.
I am able to get all of the HTML code, but fail to get 1st TD from each TR. After // 3
(in the code) tutorialsNodes is empty. The output of
NSLog(@"Nodes are : %@",[tutorialsNodes description]);
is
Practice1[62351:c07] Nodes are : ().
I can't see what's wrong. Any help would be appreciated. My code to parse this URL:
NSURL *tutorialsUrl = [NSURL URLWithString:@"http://www.neiu.edu/~neiutemp/PhoneBook/alpha.htm"];
NSData *tutorialsHtmlData = [NSData dataWithContentsOfURL:tutorialsUrl];
// 2
TFHpple *tutorialsParser = [TFHpple hppleWithHTMLData:tutorialsHtmlData];
// 3
NSString *tutorialsXpathQueryString = @"//TR/TD";
NSArray *tutorialsNodes = [tutorialsParser searchWithXPathQuery:tutorialsXpathQueryString];
NSLog(@"Nodes are : %@",[tutorialsNodes description]);
// 4
NSMutableArray *newTutorials = [[NSMutableArray alloc] initWithCapacity:0];
for (TFHppleElement *element in tutorialsNodes) {
// 5
Tutorial *tutorial = [[Tutorial alloc] init];
[newTutorials addObject:tutorial];
// 6
tutorial.title = [[element firstChild] content];
// 7
tutorial.url = [element objectForKey:@"href"];
NSLog(@"title is: %@",[tutorial.title description]);
}
// 8
_objects = newTutorials;
[self.tableView reloadData];
Upvotes: 1
Views: 3171
Reputation: 437682
This should work if you use @"//tr/td"
instead of @"//TR/TD"
.
Looking at your HTML, though, since the author of that apparently doesn't know how to spell CSS, you have font tags buried throughout the source. So, your next line of code, which is obviously taken from the excellent Hpple tutorial by Matt Galloway on Ray Wenderlich's site, says:
tutorial.title = [[element firstChild] content];
But that won't work here, because for most of your entries, the firstChild
is not the text
, but rather it's a font
tag. So you could check to see if it was a font tag like so:
TFHppleElement *subelement = [element firstChild];
if ([[subelement tagName] isEqualToString:@"font"])
subelement = [subelement firstChild];
tutorial.title = [subelement content];
Or, you could instead just search for @"//tr/td/font"
instead of @"//tr/td"
. Lots of approaches here. The trick (like all HTML parsing) is going to be to make it reasonably robust so you won't be susceptible to minor cosmetic tweaks of the page.
And obviously, your HTML doesn't have URLs there, so that code isn't applicable here.
Anyway, I hope this is enough to get you going.
You report having issues, so I thought I'd just supply a more complete code sample:
NSURL *tutorialsUrl = [NSURL URLWithString:@"http://www.neiu.edu/~neiutemp/PhoneBook/alpha.htm"];
NSData *tutorialsHtmlData = [NSData dataWithContentsOfURL:tutorialsUrl];
TFHpple *tutorialsParser = [TFHpple hppleWithHTMLData:tutorialsHtmlData];
NSString *tutorialsXpathQueryString = @"//tr/td";
NSArray *tutorialsNodes = [tutorialsParser searchWithXPathQuery:tutorialsXpathQueryString];
if ([tutorialsNodes count] == 0)
NSLog(@"nothing there");
else
NSLog(@"There are %d nodes", [tutorialsNodes count]);
NSMutableArray *newTutorials = [[NSMutableArray alloc] initWithCapacity:0];
for (TFHppleElement *element in tutorialsNodes) {
Tutorial *tutorial = [[Tutorial alloc] init];
[newTutorials addObject:tutorial];
TFHppleElement *subelement = [element firstChild];
if ([[subelement tagName] isEqualToString:@"font"])
subelement = [subelement firstChild];
tutorial.title = [subelement content];
NSLog(@"title is: %@", [tutorial.title description]);
}
That yields the following output:
2013-05-10 19:39:42.027 hpple-test[33881:c07] There are 10773 nodes 2013-05-10 19:39:42.028 hpple-test[33881:c07] title is: A 2013-05-10 19:39:46.027 hpple-test[33881:c07] title is: (null) 2013-05-10 19:39:46.698 hpple-test[33881:c07] title is: (null) 2013-05-10 19:39:47.333 hpple-test[33881:c07] title is: (null) 2013-05-10 19:39:47.827 hpple-test[33881:c07] title is: (null) 2013-05-10 19:39:48.358 hpple-test[33881:c07] title is: (null) 2013-05-10 19:39:49.133 hpple-test[33881:c07] title is: (null) 2013-05-10 19:39:49.775 hpple-test[33881:c07] title is: Abay, Hiwet B 2013-05-10 19:39:50.326 hpple-test[33881:c07] title is: H-Abay 2013-05-10 19:39:50.992 hpple-test[33881:c07] title is: 773-442-5140 2013-05-10 19:39:51.597 hpple-test[33881:c07] title is: (null) 2013-05-10 19:39:52.092 hpple-test[33881:c07] title is: Controller 2013-05-10 19:39:52.598 hpple-test[33881:c07] title is: E 2013-05-10 19:39:53.149 hpple-test[33881:c07] title is: 223 2013-05-10 19:39:55.040 hpple-test[33881:c07] title is: Abbruscato, Terence 2013-05-10 19:39:55.806 hpple-test[33881:c07] title is: T-Abbruscato 2013-05-10 19:39:56.525 hpple-test[33881:c07] title is: 773-442-5339 ...
Upvotes: 2