Arash Howaida
Arash Howaida

Reputation: 2617

Storing attributes and inner html of sibling elements in PHP

I'm trying to search and store values from an html page so I have a simple array of arrays. It will only have 2 arrays, each being 3 items long. I define it like so; these are just the headers:

$fileContents = array(
    array('Date', 'Title', 'Link')
);

The html has the following structure:

<li class='my-list'>
    <div class='my-meta'>
        <span class='my-date'>06/08/2018</span>
    </div>
    <a href='https://www.example.com/'>My Title </a>

This structure repeats a few times. I only need the first one from the top (the latest one). I can see that all the information I need or my array is there. Date is 06/08/2018, Title is My Title, and Link is www.example.com/. But I don't know how I can access them; particularly the Title and Link, because there are no classes on those elements. Just to clarify further, I want this as an end result (it's a csv):

Date, Title, Link
06/08/2018, My Title, https://www.example.com/

I am using the following approach at the moment. The only one I know how to get is the Date:

$dateClassName="my-date";

$xpath = new DomXpath($doc);
$dateList = $xpath->query("//span[contains(@class, '$dateClassName')]");
$dateNode = $dateList->item(0);

function innerHTML($node) {
    return implode(array_map([$node->ownerDocument, "saveHTML"],
            iterator_to_array($node->childNodes)));
}

$textArray = array();
array_push($textArray, innerHTML($dateNode));

The remaining items (Link, and Title) I'm not sure how to store, because there are no classes on the elements.

Question: Given my existing approach above, what more can I do to store the values I need from the HTML if the elements in question do not have an overt class to search by? Can I somehow get them by virtue of their relative sibling positions?

Upvotes: 0

Views: 29

Answers (1)

u_mulder
u_mulder

Reputation: 54796

Here's a simple code that gets all you need:

$s = "<ul>
    <li class='my-list'>
        <div class='my-meta'>
            <span class='my-date'>06/08/2018</span>
        </div>
        <a href='https://www.example.com/'>My Title </a>
    </li>
    <li class='my-list'>
        <div class='my-meta'>
            <span class='my-date'>06/08/2017</span>
        </div>
        <a href='https://www.example.com/2'>My Title2 </a>
    </li>
</ul>";

$doc = new DOMDocument();
$doc->loadHTML($s);
$xpath = new DomXpath($doc);
$li = $xpath->query("//li");
$li = $li->item(0);
var_dump($li->getElementsByTagName('a')[0]->getAttribute('href'));
var_dump($li->getElementsByTagName('div')[0]->getElementsByTagName('span')[0]->textContent);
var_dump($li->getElementsByTagName('a')[0]->textContent);

As you see, you can work with $li as it is object of type DOMElement.

Upvotes: 1

Related Questions