jack1881
jack1881

Reputation: 33

Get text from all <li> tags which also include <a> tags

I have a few <li> tags inside a <div> like this:

<li> <a href="link1"> one <li>
<li> <a href="link2"> two <li>
<li> <a href="link3"> three <li>

How can I get the text two using HTML DOM parser and then put it inside an array to use later?

Upvotes: 0

Views: 5557

Answers (3)

mickmackusa
mickmackusa

Reputation: 47904

Your input HTML looks broken without closing </a> tags, but you can still use a legitimate DOM parser to get what you need. Using XPath will directly isolate the desired text. You may or may not wish to trim the whitespaces.

Code: (Demos)

$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);

Keep whitespace:

var_export(
    array_column(
        iterator_to_array($xpath->query('//li/a/text()')),
        'nodeValue'
    )
);

Output:

array (
  0 => ' one ',
  1 => ' two ',
  2 => ' three ',
)

Trim whitespaces:

var_export(
    array_map(
        fn($text) => trim($text->nodeValue),
        iterator_to_array($xpath->query('//li/a/text()')),
    )
);

Output:

array (
  0 => 'one',
  1 => 'two',
  2 => 'three',
)

If you had fully valid HTML, the solution is simpler: (Demo)

$html = <<<HTML
<li><a href="link1">one</a></li>
<li><a href="link2">two</a></li>
<li><a href="link3">three</a></li>
HTML;

$doc = new DOMDocument();
$doc->loadHTML($html);
var_export(
    array_column(
        iterator_to_array($doc->getElementsByTagName('li')),
        'nodeValue'
    )
);

Upvotes: 0

Lawrence Cherone
Lawrence Cherone

Reputation: 46610

You need to make sure the a tag is closed then you can do it like this:

<?php 
$html = '<li> <a href="link1"> one </a> <li>
<li> <a href="link2"> two </a> <li>
<li> <a href="link3"> three </a> <li>
';

// Create a new DOM Document
$xml = new DOMDocument();

// Load the html contents into the DOM
$xml->loadHTML($html);

// Empty array to hold all links to return
$result = array();

//Loop through each <li> tag in the dom
foreach($xml->getElementsByTagName('li') as $li) {
    //Loop through each <a> tag within the li, then extract the node value
    foreach($li->getElementsByTagName('a') as $links){
        $result[] = $links->nodeValue;
    }
}
//Return the links
print_r($result);
/*
Array
(
    [0] =>  one 
    [1] =>  two 
    [2] =>  three 
)

*/
?>

Its all in the manual for domDocument

Upvotes: 4

Varol
Varol

Reputation: 1858

Consider using Simple HTML Dom Parser to achieve that. Sample code:

// include the simple html dom parser
include 'simple_html_dom.php'; 

// load the html with one of the sutiable methods available with it
$html = str_get_html('<li><a href="link1">one</a></li><li><a href="link2">two</a></li>');

// create a blank array to store the results
$items = array();

// loop through "li" elements and store the magic plaintext attribute value inside $items array
foreach( $html->find('li') as $li ) $items[] = $li->plaintext;

// this should output: Array ( [0] => one [1] => two ) 
print_r( $items );

Upvotes: 0

Related Questions