Stark
Stark

Reputation: 593

XPath not returning everything after element

I'm retrieving a ul > li of ingredients from my site, then I'm using foreach to loop through each li.

Inside the <li></li> it contains information in the following format: <strong>1-2 tablespoons</strong> <a href="link">coconut oil</a> (to taste), not all contains hyperlinks, it's random.

All I'm trying to do is break up the data so I can put them into an array like so:

array(
    0 => array(
        'amount' => 2 ounces,
        'ingredients' => pre-cooked chicken
    ),
    1 => array(
        'amount' => 1-2 tablespoons,
        'ingredients' => coconut oil (to taste)
    )
);

While maintaining the html a link in the coconut oil part.

Here is the code that I'm using.

$string is an array with the li content
foreach($string as $data){
    $try = new \DOMdocument;
    $try->loadHTML($data);
    $find = new \DOMXPath($try);

    // from this point it's where I'm having problems
    $x = $find->query('//li');
    foreach($x as $data){
        echo '<pre>', print_r($data), '</pre>';
    }
}

The print_r($data) returns the following DOMElement Objects (with other empty keys like parentNode, childNode, firstChild, previousSibling):

DOMElement Object (
    [tagName] => li
    [schemaTypeInfo] => 
    [nodeName] => li
    [nodeValue] => 2 ounces pre-cooked chicken
    [nodeType] => 1
    [attributes] => (object value omitted)
    [ownerDocument] => (object value omitted)
    [localName] => li
    [textContent] => 2 ounces pre-cooked chicken
)
DOMElement Object (
    [tagName] => li
    [schemaTypeInfo] => 
    [nodeName] => li
    [nodeValue] => 1-2 tablespoons coconut oil (to taste)
    [nodeType] => 1
    [attributes] => (object value omitted)
    [ownerDocument] => (object value omitted)
    [localName] => li
    [textContent] => 1-2 tablespoons coconut oil (to taste)
)

I thought it would be best to break up the information, in 1 query I just get all of the data inside the strong tag, but the issue that I'm having is actually just getting all of the content after the strong tag.

Here I try to get all of the content after the strong tag:

$list = $find->query('//strong/following-sibling::text()');
foreach($list as $data){
    $i[] = $try->saveHTML($data);
}

If I print_r($i) I get the following:

Array
(
    [0] =>  pre-cooked chicken
    [1] =>  
    [2] =>  (to taste)
)

but if I change the query to $list = $find->query('//strong/following-sibling::*') all I get is the following which is a hyperlink.

Array
(
    [0] => coconut oil
)

Update:

Input array:

Array (
    [0] => <strong>2 ounces</strong> pre-cooked chicken
    [1] => <strong>1-2 tablespoons</strong> <a href="/link">coconut oil</a> (to taste)
) 

And

Expected output:

array(
    0 => array(
        'amount' => 2 ounces,
        'ingredients' => pre-cooked chicken
    ),
    1 => array(
        'amount' => 1-2 tablespoons,
        'ingredients' => <a href="/link">coconut oil</a> (to taste)
    )
);

Upvotes: 1

Views: 408

Answers (1)

Sahil Gulati
Sahil Gulati

Reputation: 15141

Are you expecting something like this? Hope this seems to be helpful. Here we are using preg_match.

Try this code snippet here

<?php
ini_set('display_errors', 1);
$result=array();
$array=Array (
    0 => "<strong>2 ounces</strong> pre-cooked chicken",
    1 => '<strong>1-2 tablespoons</strong> <a href="/link">coconut oil</a> (to taste)'
);
foreach($array as $data) 
{
    preg_match("/<strong>(.*?)(?:<\/strong>)(.*)/",$data,$matches);
    $result[]=array(
        "amount"=>$matches[1],
        "ingredients"=>$matches[2]
    );
}
print_r($result);

Upvotes: 1

Related Questions