Reputation: 4755
I'm currently attempting to pull in specific data from an html site using xpath queries, but I'm having trouble pulling in specific parts.
Using //div[@id='main']/h2
as my xpath query I am able to pull the "View Current" text using the following:
exampleSite.title = [[element firstChild] content];
However I would also like to pull in the following:
1. <b>5/9/2013<nbsp><nbsp> 10:58:45 PM</b>
2. <b>6.32</b>
3. <b>5 Total Points</b>
4. <b>3.72</b>
So far I've got this: //div[@id='main']/table[@class='bodytext']/tr
but that's where I get stuck. Any help would be greatly appreciated! Thank you!
Here is the html I'm attempting to scrape:
<div id="main">
<h2>View Current</h2>
<table width="96%" border="0" cellpadding="4" cellspacing="0" bordercolor="#eeeeee" align="center" height="276" valign="top" class="bodytext">
<tr valign="top" >
<td colspan = 2 height="13" valign="top" align="left" width="54%" class="headerblue" >Balances <br>
</td>
</tr>
<tr valign="top" >
<td colspan = 2 height="13" valign="top" align="left" width="54%" class="text" >Balances
as of: <b>5/9/2013<nbsp><nbsp> 10:58:45 PM</b></td>
</tr>
<tr valign="top" >
<td colspan = 2 height="13" valign="top" align="left" width="46%" class="text" >Account
Number: <b>101010123</b></td>
</tr>
<tr valign="top" >
<td colspan = 2 height="13" valign="top" align="left" width="46%" class="text" ></td>
</tr>
<tr valign="top" >
<td height="13" valign="top" align="left" width="46%" class="text" >Example Card Amount:
<b>6.32</b></td>
<td height="13" valign="top" align="left" width="46%" class="text" ><a href="balance.asp?">View Details</a></td>
</tr>
<tr valign="top" >
<td height="13" valign="top" align="left" width="46%" class="text" >Example Dining Plans:<b>5 Total Points</b>
</td>
<td height="13" valign="top" align="left" width="46%" class="text" ><a href="balance2.asp?">View Details</a></td>
</tr>
<tr valign="top" >
<td height="13" valign="top" align="left" width="46%" class="text" >Credit For Printing:
<b>3.72</b></td>
<td height="13" valign="top" align="left" width="46%" class="text" ><a href="balance1.asp?">View Details</a></td>
</tr>
<td colspan = 2 height="13" valign="top" align="CENTER" class="text">For
questions contact Cashiers at<BR> (000)000-0011 or <a href="mailto:[email protected]">[email protected]</a></td>
</tr>
<tr valign="top">
<td colspan = 2 height="13" valign="top" align="CENTER" class="text" >
<a href="balance1.asp">All Plan Usage for last 90 days is available here</a>
</td>
</tr>
<tr valign="top">
<td colspan = 2 height="13" valign="top" align="CENTER" class="text" >
<a href="balance.asp?pln=Full">All Usage for last 365 days is available here</a>
</td>
</tr>
</table>
</div>
Upvotes: 0
Views: 501
Reputation: 19863
Here is an extension to Mennny's answer, which is actually right, so you should accept it. I'll try to answer your additional questions in the comments:
You do your parsing like this: (htmlData
is my demo data)
NSData *htmlData = [NSData dataWithContentsOfFile:[@"/Users/dennis/Desktop/demo.html" stringByStandardizingPath]];
TFHpple *parser = [[TFHpple alloc] initWithHTMLData:htmlData];
NSArray *bTags = [parser searchWithXPathQuery:@"//div[@id='main']/table[@class='bodytext']/tr/td/b"];
After that you put the contents of the parsed <b>
tags in an NSMutableArray
.
NSMutableArray *stringsInBTag = [[NSMutableArray alloc] initWithCapacity:0];
for (TFHppleElement *element in bTags) {
[stringsInBTag addObject:element.content];
}
What you get there is: (logged output of the array)
( "5/9/2013", 101010123, "6.32", "5 Total Points", "3.72" )
Now you want to set your labels:
// Set label 1 to third <b>
self.label1.text = stringsInBTag[2];
// Set label 2 to first <b>
self.label2.text = stringsInBTag[0];
Upvotes: 1
Reputation: 405
//div[@id='main']/table[@class='bodytext']/tr/td/b
should give you a list of all <b>
s in your table cells.
Upvotes: 2