Reputation: 2144
I'm trying to scrape a page and I'm not very familiar with php frameworks, so I've been trying to learn Symfony2. I have it up and running, and now I'm trying to use Goutte. It's installed in the vendor folder, and I have a bundle I'm using for my scraping project.
Question is, is it good practice to do scraping from a Controller
? And how? I have searched forever and cannot figure out how to use Goutte
from a bundle, since it's buried deep withing the file structure.
<?php
namespace ontf\scraperBundle\Controller;
use Symfony\Bundle\FrameworkBundle\Controller\Controller;
use Goutte\Client;
class ThingController extends Controller
{
public function somethingAction($something)
{
$client = new Client();
$crawler = $client->request('GET', 'http://www.symfony.com/blog/');
echo $crawler->text();
return $this->render('scraperBundle:Thing:index.html.twig');
// return $this->render('scraperBundle:Thing:index.html.twig', array(
// 'something' => $something
// ));
}
}
Upvotes: 2
Views: 1276
Reputation: 7582
I'm not sure I have heard of "good practices" as far as scraping goes but you may be able to find some in the book PHP Architect's Guide to Web Scraping with PHP.
These are some guidelines I have used in my own projects:
php app/console scraper:run example.com --env=prod --no-debug
Where app/console is where the Symfony2 console applicaiton lives, scraper:run is the name of your command, example.com is an argument to indicate the page you want to scrape, and the --env=prod --no-debug are the flags you should use to run in production. see code below for example. Ontf/ScraperBundle/Resources/services.yml
services:
goutte_client:
class: Goutte\Client
scraperCommand:
class: Ontf\ScraperBundle\Command\ScraperCommand
arguments: ["@goutte_client"]
tags:
- { name: console.command }
And your command should look something like this:
<?php
// Ontf/ScraperBundle/Command/ScraperCommand.php
namespace Ontf\ScraperBundle\Command;
use Symfony\Component\Console\Command\Command;
use Symfony\Component\Console\Input\InputArgument;
use Symfony\Component\Console\Input\InputInterface;
use Symfony\Component\Console\Input\InputOption;
use Symfony\Component\Console\Output\OutputInterface;
use Goutte\Client;
abstract class ScraperCommand extends Command
{
private $client;
public function __construct(Client $client)
{
$this->client = $client;
parent::__construct();
}
protected function configure()
{
->setName('scraper:run')
->setDescription('Run Goutte Scraper.')
->addArgument(
'url',
InputArgument::REQUIRED,
'URL you want to scrape.'
);
}
protected function execute(InputInterface $input, OutputInterface $output)
{
$url = $input->getArgument('url');
$crawler = $this->client->request('GET', $url);
echo $crawler->text();
}
}
Upvotes: 4
Reputation: 17906
You Should take a Symfony-Controller if you want to return a response, e.G a html output.
if you only need the function for calculating or storing stuff in database, You should create a Service class that represents the functionality of your Crawler, e.G
class CrawlerService
{
function getText($url){
$client = new Client();
$crawler = $client->request('GET', $url);
return $crawler->text();
}
and to execute it i would use a Console Command
If you want to return a Response use a Controller
Upvotes: 0