Reputation: 15
I want to scrap webpage's content, which I already did, But my problem is, I can't get accurate link text, if there is any child tag in the link text. For example, my xquery is "//div[@class='someclass']/div/a/text()"
, then it gets accurate result if link is somethings like this <a href='somelink'> this is link </a>
, (my output is :: this is link) but if the link is : <a href='somelink'> this is <br /> another text </a>
, then my output is this is, another text because of child tag br, Then I google, then may be got some solution, my solution may be fn:string(), but I can't figure out how can I use fn:string() in xquery/xpath in php
Upvotes: 0
Views: 419
Reputation: 2025
you didn't show your html code. So I guess your html code is looking like this :
<div class='someclass'>
<div class='otherclass'>
<a href='somelink'> some text including child element <a>
</div>
</div>
you can try as like as given below ::
//div[@class='someclass']/div/*
It will give you all information inside otherclass div, Now if you tried as like as given below, may be your problem is being solved:
<?php
$linkQuery = $xpath->query("//div[@class='someclass']/div/*");
$linkText = array();
for($i=0, $len = ($linkQuery->length) ? $linkQuery->length : -1; $i < $len; $i++ ) {
$linkText[] = ( $linkQuery->item($i) != NULL ) ? preg_replace('/\s+/', ' ', $linkQuery->item($i)->nodeValue ) : 'some default text';
}
?>
Now you get all text inside your link text.
Upvotes: 0
Reputation: 38682
text()
selects all text nodes directly below a certain element. For <a href='somelink'> this is <br /> another text </a>
, these are two elements, in case of <a href='somelink'> this is <strong>another</strong> text </a>
will even omit the word another
, as it isn't a direct child of the anchor tag.
If querying a single anchor tag within one XPath expression, use the string($element)
function without any text()
matcher, eg.
string(//div[@class='someclass']/div/a)
If your expression returns a sequence (in PHP: list/array) of results, loop over the results and for each anchor tag run the XPath expression string(.)
(with .
being the current context). For more control, you might want to use .//text()
to fetch all text nodes below the current context, and concatenate them in PHP. There's another answer explaining this in detail.
Be aware PHP only supports XPath 1.0 – no XQuery, and no XPath 2.0.
Upvotes: 1