Reputation: 2617
I'm trying to search and store values from an html page so I have a simple array of arrays. It will only have 2 arrays, each being 3 items long. I define it like so; these are just the headers:
$fileContents = array(
array('Date', 'Title', 'Link')
);
The html has the following structure:
<li class='my-list'>
<div class='my-meta'>
<span class='my-date'>06/08/2018</span>
</div>
<a href='https://www.example.com/'>My Title </a>
This structure repeats a few times. I only need the first one from the top (the latest one). I can see that all the information I need or my array is there. Date
is 06/08/2018
, Title
is My Title
, and Link
is www.example.com/
. But I don't know how I can access them; particularly the Title and Link, because there are no classes on those elements. Just to clarify further, I want this as an end result (it's a csv):
Date, Title, Link
06/08/2018, My Title, https://www.example.com/
I am using the following approach at the moment. The only one I know how to get is the Date:
$dateClassName="my-date";
$xpath = new DomXpath($doc);
$dateList = $xpath->query("//span[contains(@class, '$dateClassName')]");
$dateNode = $dateList->item(0);
function innerHTML($node) {
return implode(array_map([$node->ownerDocument, "saveHTML"],
iterator_to_array($node->childNodes)));
}
$textArray = array();
array_push($textArray, innerHTML($dateNode));
The remaining items (Link, and Title) I'm not sure how to store, because there are no classes on the elements.
Question: Given my existing approach above, what more can I do to store the values I need from the HTML if the elements in question do not have an overt class to search by? Can I somehow get them by virtue of their relative sibling positions?
Upvotes: 0
Views: 29
Reputation: 54796
Here's a simple code that gets all you need:
$s = "<ul>
<li class='my-list'>
<div class='my-meta'>
<span class='my-date'>06/08/2018</span>
</div>
<a href='https://www.example.com/'>My Title </a>
</li>
<li class='my-list'>
<div class='my-meta'>
<span class='my-date'>06/08/2017</span>
</div>
<a href='https://www.example.com/2'>My Title2 </a>
</li>
</ul>";
$doc = new DOMDocument();
$doc->loadHTML($s);
$xpath = new DomXpath($doc);
$li = $xpath->query("//li");
$li = $li->item(0);
var_dump($li->getElementsByTagName('a')[0]->getAttribute('href'));
var_dump($li->getElementsByTagName('div')[0]->getElementsByTagName('span')[0]->textContent);
var_dump($li->getElementsByTagName('a')[0]->textContent);
As you see, you can work with $li
as it is object of type DOMElement
.
Upvotes: 1