Parsing HTML Table Data from XML with PHP

Question

I am somewhat new with PHP, but can't really wrap my head around what I am doing wrong here given my situation.

Problem: I am trying to get the href of a certain HTML element within a string of characters inside an XML object/element via Reddit (if you visit this page, it would be the actual link of the video - not the reddit link but the external youtube link or whatever - nothing else).

Here is my code so far (code updated):

Update: Loop-mania! Got all of the hrefs, but am now trying to store them inside a global array to access a random one outside of this function.

function getXMLFeed() {
    echo "Reddit Items


";
    //$feedURL = file_get_contents('https://www.reddit.com/r/videos/.xml?limit=200');
    $feedURL = 'https://www.reddit.com/r/videos/.xml?limit=200';
    $xml = simplexml_load_file($feedURL);
    //define each xml entry from reddit as an item
    foreach ($xml -> entry as $item ) {
        foreach ($item -> content as $content) {
            $newContent = (string)$content;    
            $html = str_get_html($newContent);

            foreach($html->find('table') as $table) {
                $links = $table->find('span', '0');
                //echo $links;
                foreach($links->find('a') as $link) {
                    echo $link->href;
                }
            }
        }
    }
}

XML Code: http://pasted.co/0bcf49e8

I've also included JSON if it can be done this way; I just preferred XML: http://pasted.co/f02180db

That is pretty much all of the code. Though, here is another piece I tried to use with DOMDocument (scrapped it).

    foreach ($item -> content as $content) {
       $dom = new DOMDocument();
       $dom -> loadHTML($content);
       $xpath = new DOMXPath($dom);
       $classname = "/html/body/table[1]/tbody/tr/td[2]/span[1]/a";



       foreach ($dom->getElementsByTagName('table') as $node) {
          echo $dom->saveHtml($node), PHP_EOL;
          //$originalURL = $node->getAttribute('href');
       }

       //$html = $dom->saveHTML();

    }

I can parse the table fine, but when it comes to getting certain element's values (nothing has an ID or class), I can only seem to get ALL anchor tags or ALL table rows, etc.

Can anyone point me in the right direction? Let me know if there is anything else I can add here. Thanks!

Added HTML: I am specifically trying to extract [link] from each table/item. http://pastebin.com/QXa2i6qz

Wolverine · Accepted Answer

The following code can extract you all the youtube links from each content.

function extract_youtube_link($xml) {
    $entries = $xml['entry'];
    $videos = [];
    foreach($entries as $entry) {
        $content = html_entity_decode($entry['content']);
        preg_match_all('/$$link$$/', $content, $matches);
        if(!empty($matches[1][0])) {
            $videos[] = array(
                'entry_title' => $entry['title'],
                'author' => preg_replace('//(.*)//', '', $entry['author']['name']),
                'author_reddit_url' => $entry['author']['uri'],
                'video_url' => $matches[1][0]
            );
        }
    }

    return $videos;
}

$xml = simplexml_load_file('reddit.xml');
$xml = json_decode(json_encode($xml), true);
$videos = extract_youtube_link($xml);

foreach($videos as $video) {
    echo "Entry Title: {$video['entry_title']}";
    echo "Author: {$video['author']}";
    echo "Author URL: {$video['author_reddit_url']}";
    echo "Video URL: {$video['video_url']}";
    echo "

";
}

The code outputs in the multidimensional format of array with the elements inside are entry_title, author, author_reddit_url and video_url. Hope it helps you!

Parsing HTML Table Data from XML with PHP

Answers (2)

Related Questions