cr92 cr
cr92 cr

Reputation: 13

PHP scrape links from table

How to get only one link from the table?

<table>
        <tr class="title">
        <td width="40%">a </td>
        <td width="40%">b</td>
        <td width="10%">c</td>
        <td width="10%">d</td>
        </tr>
        <tr>
        <td>abc.com</td>
        <td>123.123.526.12</td>
        <td><a class="update" href="fruit/grape"</a></td>
        <td><a class="delete" href="fruit/grape"></a></td>
        <td> </td>
        </tr>
        <tr>
        <td>bcd.com</td>
        <td>123.256.33.123</td>
        <td><a class="update" href="fruit/apple"></a></td>
        <td><a class="delete" href="fruit/apple"></a></td>
        <td> </td>
        </tr>
        </table>

my code:

$html_doc = new DOMDocument;
libxml_use_internal_errors(true);
$html_doc->loadHTML($html);
libxml_clear_errors();
$html_xpath = new DOMXPath($html_doc);

$link1 = $html_xpath->query('//table/tr[not(contains(@class,"title"))]');
foreach($link1 as $a)
{   
    $bac = $a->nodeValue;
    echo $bac."<br>";
    $rows = $a->getElementsByTagName("a");
    foreach ($rows as $row)
    {
        echo $row->getAttribute("href")."<br>";
    }
}

Output:

 abc.com 123.123.526.12
    fruit/grape
    fruit/grape
    bcd.com 123.256.33.123
    fruit/apple
    fruit/apple

The code above return 2 href attribute to me. my expected output is one href attribute for each row.

My expected Output:

        abc.com 123.123.526.12
        fruit/grape
        bcd.com 123.256.33.123
        fruit/apple

How can i do it to fit my expected output?

Upvotes: 0

Views: 105

Answers (1)

redelschaap
redelschaap

Reputation: 2814

Well that is because you echo each anchor. You could put them in an array and check if you have already collected that link:

$all_links = array();

foreach($link1 as $a)
{   
    $bac = $a->nodeValue;
    $all_links[$bac] = array();
    $rows = $a->getElementsByTagName("a");

    foreach ($rows as $row)
    {
        $href = $row->getAttribute("href");
        if (!in_array($href, $all_links[$bac])) {
            $all_links[$bac][] = $href;
        }
    }
}

Link to fiddle: http://phpfiddle.org/main/code/e9y3-23gw

My output:

Array
(
    [abc.com        123.123.526.12] => Array
        (
            [0] => fruit/grape
        )

    [bcd.com        123.256.33.123] => Array
        (
            [0] => fruit/apple
        )

)

Upvotes: 1

Related Questions