Reputation: 27
I've been running my Scrapy project with a couple of accounts (the project scrapes a especific site that requieres login credentials), but no matter the parameters I set, it always runs with the same ones (same credentials).
I'm running under virtualenv. Is there a variable or setting I'm missing?
Edit:
It seems that this problem is Twisted related.
Even when I run:
scrapy crawl -a user='user' -a password='pass' -o items.json -t json SpiderName
I still get an error saying:
ERROR: twisted.internet.error.ReactorNotRestartable
And all the information I get, is the last 'succesful' run of the spider.
Upvotes: 2
Views: 1065
Reputation: 27
Found the problem. My project tree was 'dirty'.
After another developer changed the name of the file that contained the spider code and I updated my local repo with those changes, this only deleted the .py version of the old file and left the .pyc file (cause of .hgignore). This was making Scrapy find the same spider module twice (since the same spider was under two different files), and calling them both under the same Twisted reactor.
After deleting the offending file everything is back to normal.
Upvotes: 1
Reputation: 813
You should check your spider's __init__
method, you should pass there username and password if it's not there. Like that:
class MySpider(BaseSpider):
name = 'myspider'
def __init__(self, username=None, password=None, *args, **kwargs):
super(MySpider, self).__init__(*args, **kwargs)
self.start_urls = ['http://www.example.com/']
self.username = username
self.password = password
def start_requests(self):
return [FormRequest("http://www.example.com/login",
formdata={'user': self.username, 'pass': self.password,
callback=self.logged_in)]
def logged_in(self, response):
# here you would extract links to follow and return Requests for
# each of them, with another callback
pass
Run it:
scrapy crawl myspider -a username=yourname password=yourpass
Code adapted from: http://doc.scrapy.org/en/0.18/topics/spiders.html
EDIT: You can have only one Twisted reactor. But you can use run multiple spiders in the same process with different credentials. Example of running multiple spiders: http://doc.scrapy.org/en/0.18/topics/practices.html#running-multiple-spiders-in-the-same-process
Upvotes: 2