Placeholder
Placeholder

Reputation: 689

Content working fine in print but not in array

Currently, I am able to scrape the content from my desired website without any problems, but if you view my demo, you can see that in my array it's only displaying The Source no matter what I change around, it's not fixing..

$page = (isset($_GET['p'])&&$_GET['p']!=0) ? (int) $_GET['p'] : '';  
$html = file_get_html('http://screenrant.com/movie-news/'.$page);
foreach($html->find('#site-top ul h2 a') as $element)
{
        print '<br><br>';
        echo $url = ''.$element->href;
        $html2 = file_get_html($url);
        print '<br><br>';

        $image = $html2->find('meta[property=og:image]',0);
        print $news['image'] = $image->content;
        print '<br><br>';

        // Ending The Featured Image
        $title = $html2->find(' header > h1',0);
        print $news['title'] = $title->plaintext;

        print '<br>';
        // Ending the titles
        print '<br>';

        $articles = $html2->find('div.top-content > article > p');
        foreach ($articles as $article) {
            echo "$article->plaintext<p>";
        }
        $news['content'] =  $article->plaintext;

        print '<br><br>';
        #post> div:nth-child(2) > header > p > time
        $date = $html2->find('header > p > time',0);
        $news['date'] = $date->plaintext;

        $dexp = explode(', ',$date);

        print $date = $dexp[0].', '.$dexp[1];

        print '<br><br>';

        $genre = "news";
        print '<br>';

             mysqli_query($DB,"INSERT INTO `wp_scraped_news` SET
                                    `hash` = '".$news['title']."',
                                    `title` = '".$news['title']."',
                                    `image` = '".$news['image']."',
                                    `content` = '".$news['content']."'");
             print '<pre>';print_r($news);print '</pre>';
}

Currently using simple_html_dom.php to scrape.

enter image description here

Upvotes: 1

Views: 39

Answers (1)

Matt
Matt

Reputation: 2869

If you take a look at this piece of code:

$articles = $html2->find('div.top-content > article > p');
foreach ($articles as $article) {
   echo "$article->plaintext<p>"; 
   //This is printing the article content line by line
}
$news['content'] =  $article->plaintext; 
//This is grabbing the last line of the article content AKA the source 
//The last <p> as it's not in the foreach.

Effectively, you need to be doing this:

$articles = $html2->find('div.top-content > article > p');
foreach ($articles as $article) {
    echo "$article->plaintext<p>"; 
    $news['content'] = $news['content'] . $article->plaintext . "<p>";
}

Upvotes: 1

Related Questions