petervaz
petervaz

Reputation: 14205

How can I map those fields using xpath?

I'm trying to map a list of fields from a website using php DOMXPath object and I'm struggling on it. I tried to read by absolute position but it breaks when a field is missing and I figured it might be possible to use the field names delimited by the strong tag to find the correct values. How can I do this?

website sample:

<div class="container">
    <strong>field1: </strong>
    <a href="http://link/1">value1</a>
    <a href="http://link/2">value2</a>
    <br>
    <strong>field2:</strong>
    <a href="http://link/3">value3</a>
    <br>
    <strong>field3:</strong>
    <a href="http://link/4">value4</a>
</div>

I need something like:

array = {
    field1 => 
        array = {
            'value1',
            'value2'
        },
    field2 => 'value3',
    field3 => 'value4'
}

or

array = {
    field1 => 'value1 value2',
    field2 => 'value3',
    field3 => 'value4'
}

A working example would be most apreciated since I'm just beggining at this subject.

Upvotes: 0

Views: 132

Answers (1)

user142162
user142162

Reputation:

$dom = new DOMDocument();
$dom->loadHTML($str); // Or however you load your HTML

$xpath = new DOMXPath($dom);
$items = $xpath->query('//div[@class = "container"]/strong');


$arr = array();
for($i = 0; $i < $items->length; $i++)
{
    $node = $items->item($i);
    $name = trim($node->nodeValue, ': ');
    $node_items = array();
    while(true)
    {
        $node = $node->nextSibling->nextSibling;
        if($node == NULL || $node->nodeName != 'a')
        {
            break;
        }
        $node_items[] = $node->nodeValue;
    }

    $arr[$name] = count($node_items) == 1 ? $node_items[0] : $node_items;
}

Gives the result ($arr):

Array
(
    [field1] => Array
        (
            [0] => value1
            [1] => value2
        )

    [field2] => value3
    [field3] => value4
)

Upvotes: 1

Related Questions