Reputation: 689
Currently, I am able to scrape the content from my desired website without any problems, but if you view my demo, you can see that in my array it's only displaying The Source no matter what I change around, it's not fixing..
$page = (isset($_GET['p'])&&$_GET['p']!=0) ? (int) $_GET['p'] : '';
$html = file_get_html('http://screenrant.com/movie-news/'.$page);
foreach($html->find('#site-top ul h2 a') as $element)
{
print '<br><br>';
echo $url = ''.$element->href;
$html2 = file_get_html($url);
print '<br><br>';
$image = $html2->find('meta[property=og:image]',0);
print $news['image'] = $image->content;
print '<br><br>';
// Ending The Featured Image
$title = $html2->find(' header > h1',0);
print $news['title'] = $title->plaintext;
print '<br>';
// Ending the titles
print '<br>';
$articles = $html2->find('div.top-content > article > p');
foreach ($articles as $article) {
echo "$article->plaintext<p>";
}
$news['content'] = $article->plaintext;
print '<br><br>';
#post> div:nth-child(2) > header > p > time
$date = $html2->find('header > p > time',0);
$news['date'] = $date->plaintext;
$dexp = explode(', ',$date);
print $date = $dexp[0].', '.$dexp[1];
print '<br><br>';
$genre = "news";
print '<br>';
mysqli_query($DB,"INSERT INTO `wp_scraped_news` SET
`hash` = '".$news['title']."',
`title` = '".$news['title']."',
`image` = '".$news['image']."',
`content` = '".$news['content']."'");
print '<pre>';print_r($news);print '</pre>';
}
Currently using simple_html_dom.php to scrape.
Upvotes: 1
Views: 39
Reputation: 2869
If you take a look at this piece of code:
$articles = $html2->find('div.top-content > article > p');
foreach ($articles as $article) {
echo "$article->plaintext<p>";
//This is printing the article content line by line
}
$news['content'] = $article->plaintext;
//This is grabbing the last line of the article content AKA the source
//The last <p> as it's not in the foreach.
Effectively, you need to be doing this:
$articles = $html2->find('div.top-content > article > p');
foreach ($articles as $article) {
echo "$article->plaintext<p>";
$news['content'] = $news['content'] . $article->plaintext . "<p>";
}
Upvotes: 1