Kiera Knight
Kiera Knight

Reputation: 31

How do I start running portia spider?

The given syntax from https://github.com/scrapinghub/portia#running-a-portia-spider

portiacrawl PROJECT_PATH SPIDER_NAME

I tried running

portiacrawl D:/portia-master/slyd/data/projects/darkwoods example
portiacrawl slyd/data/projects/darkwoods example
portiacrawl slyd/data/projects/darkwoods

But they give me the same help message.

Usage: portiacrawl <project dir/project zip> [spider] [options]

Allow to easily run slybot spiders on console. If spider is not given, print a
list of available spiders inside the project

Options:
  -h, --help            show this help message and exit
  --settings=SETTINGS   Give specific settings module (must be on python path)
  --logfile=LOGFILE     Specify log file
  -a NAME=VALUE         Add spider arguments
  -s NAME=VALUE         Add extra scrapy settings
  -o FILE, --output=FILE
                        dump scraped items into FILE (use - for stdout)
  -t FORMAT, --output-format=FORMAT
                        format to use for dumping items with -o (default:
                        jsonlines)
  -v, --verbose         more verbose

I am very new to portia, so I am very confuse as to what to do. Can anyone give me a sample of what should I write for the PROJECT_PATH? I am currently using portia via vagrant.

Upvotes: 1

Views: 2315

Answers (3)

siegfried415
siegfried415

Reputation: 1

I have create portia-dashboard which you can find at github, a docker image is also avaliable at docker hub. With portia-dashboard, you can deploy a project, start a spider, or monitor job status by mouse click in a simple web interface. Refer to doc to get detail information on how to start a spider.

Upvotes: 0

Mihai
Mihai

Reputation: 133

You can use scrapyd to run the spider.

curl http://your_scrapyd_host:6800/schedule.json -d project=your_project_name -d spider=your_spider_name

This way you can also have a basic monitoring of the spider. I also found a quick and simple web interface that helps with deploying the spider once it's been deployed with scrapyd: https://gist.github.com/MihaiCraciun/78f0a53b7a99587d178b

Hope it helps !

Upvotes: 0

Kiera Knight
Kiera Knight

Reputation: 31

I forgot which question was it, but someone mentioned cd to the directory before using the command portiacrawl. After exploring vagrant for a while, I found the directory and its at /vagrant/slyd/data/projects.

So to run portiacrawl, you just have to cd to portia directory before doing portiacrawl

portiacrawl /vagrant/slyd/data/projects/[project name] [spider] [options]

I ran this command and it worked

portiacrawl /vagrant/slyd/data/projects/darkwoods example

Upvotes: 1

Related Questions