Reputation: 2524
So I think this is my last Hpple question! I have found an entry in the HTML doc that I am parsing with Hpple. I have tried many different queries, but no luck. Here is a sample of the HTML.
I can get the text staring with "Today's project" with //div[@class = 'entry-content']/p. I can also get the next tag with //div[@class = 'entry-content']//a[@title]//* along with all the text after it. However, as you can see there is still some text after "/span". However, nothing that I have tried will work. I have tried looking at the children of the element, tried //div[@class = 'entry-content']/p//text(), //div[@class = 'entry-content']/p//following::*, nothing works. If anyone has any ideas, I am all ears!!! Thanks again for all of your time.
EDIT #1 As I try different things I was looking at the HTML. Under the p tag is the text I need, "Today's project..." then there is a span changing the text color and including a link, followed by more text. What I need to do is jump over that span to continue reading the text. Maybe my question should be, how do you jump over a span? Thanks for looking.
EDIT #2 Well, I am going to start a bounty on this one. I really need some help. I have looked everywhere and have tried a ton of different things. But nothing is working for me. I can not get the text after that one closed span. And this format appears often. The author of the blog I am parsing this for the App sometimes changes the style of her words and I can not get the text after she changes the style. Any help would be appreciated. Thanks again for looking.
EDIT #3 Here is another screen shot of the DOM tree HTML. If you can notice I am parsing the div class "entry content" The text in question is exposed. Starts with "Today..." then the span to change the color of the text, I can get that text. It is the text after that, that I need, " It was one....." right before the close p tag.
I also placed the entire HTML on gist. HERE. The line in question is 102. Although the HTML did not copy that nicely. Thanks.
Upvotes: 2
Views: 1346
Reputation: 869
Make some changes in the code to get further on the hierarchy and it worked on your html sample. Note: I'm appending all the entry-content in a single NSMutableString to make it easier. Like I warned you in the comment, use it with caution. :-)
NSString *filePath = [[NSBundle mainBundle] pathForResource:@"test" ofType:@"html"];
NSData *data = [NSData dataWithContentsOfFile:filePath];
TFHpple *detailParser = [TFHpple hppleWithHTMLData:data];
NSString *xpathQueryString = @"//div[@class='entry-content']";
NSArray *node = [detailParser searchWithXPathQuery:xpathQueryString];
NSMutableString *test = [[NSMutableString alloc] initWithString:@""];
for (TFHppleElement *element in node) {
for (TFHppleElement *child in element.children) {
if (child.content != nil) {
[test appendString:child.content];
}
if ([child.children count]!= 0) {
for (TFHppleElement *grandchild in child.children) {
if (grandchild.content != nil) {
[test appendString:grandchild.content];
}
for (TFHppleElement *greatgrandchild in grandchild.children) {
if (greatgrandchild.content != nil) {
[test appendString:greatgrandchild.content];
}
for (TFHppleElement *greatgreatgrandchild in greatgrandchild.children) {
if (greatgreatgrandchild.text != nil) {
[test appendString:greatgreatgrandchild.text];
}
if (greatgreatgrandchild.content != nil) {
[test appendString:greatgreatgrandchild.content];
}
}
}
}
}
}
}
NSLog(@"test = %@", test);
Upvotes: 3
Reputation: 1659
Call me a "raw" guy, but you could read the code as a straight up string and then bust it up into an array by the tags you're going for. This could be done in PHP/Javascript/etc. Then you could just pull the array element containing the text you're looking for. Nothing fancy/external needed.
Example:
$string = '<p>text is here</p><p>more text is here</p>';
$string = explode('<p>', $string);
Now $string = [0] => "text is here</p>", [1] => "more text is here</p>"
Upvotes: 0