user3587919
user3587919

Reputation: 15

fn:string function for web content

I want to scrap webpage's content, which I already did, But my problem is, I can't get accurate link text, if there is any child tag in the link text. For example, my xquery is "//div[@class='someclass']/div/a/text()", then it gets accurate result if link is somethings like this <a href='somelink'> this is link </a>, (my output is :: this is link) but if the link is : <a href='somelink'> this is <br /> another text </a>, then my output is this is, another text because of child tag br, Then I google, then may be got some solution, my solution may be fn:string(), but I can't figure out how can I use fn:string() in xquery/xpath in php

Upvotes: 0

Views: 419

Answers (2)

sabbir
sabbir

Reputation: 2025

you didn't show your html code. So I guess your html code is looking like this :

<div class='someclass'>
   <div class='otherclass'>
      <a href='somelink'> some text including child element <a>
   </div> 
</div>

you can try as like as given below ::

//div[@class='someclass']/div/*

It will give you all information inside otherclass div, Now if you tried as like as given below, may be your problem is being solved:

<?php
   $linkQuery     =  $xpath->query("//div[@class='someclass']/div/*");

   $linkText = array();       

   for($i=0, $len = ($linkQuery->length) ? $linkQuery->length : -1; $i < $len; $i++ )  {
      $linkText[]  = ( $linkQuery->item($i) != NULL ) ? preg_replace('/\s+/', ' ', $linkQuery->item($i)->nodeValue )  : 'some default text'; 
   } 
?>   

Now you get all text inside your link text.

Upvotes: 0

Jens Erat
Jens Erat

Reputation: 38682

text() selects all text nodes directly below a certain element. For <a href='somelink'> this is <br /> another text </a>, these are two elements, in case of <a href='somelink'> this is <strong>another</strong> text </a> will even omit the word another, as it isn't a direct child of the anchor tag.

If querying a single anchor tag within one XPath expression, use the string($element) function without any text() matcher, eg.

string(//div[@class='someclass']/div/a)

If your expression returns a sequence (in PHP: list/array) of results, loop over the results and for each anchor tag run the XPath expression string(.) (with . being the current context). For more control, you might want to use .//text() to fetch all text nodes below the current context, and concatenate them in PHP. There's another answer explaining this in detail.

Be aware PHP only supports XPath 1.0 – no XQuery, and no XPath 2.0.

Upvotes: 1

Related Questions