Reputation: 1495
Im trying to use scrapy to crawl www.mywebsite.com
.
www.mywebsite.com
is hosted on a free host with the url www.mywebsite.freehost.com
. I am redirecting the free host to my paid domain.
The problem here is that scrapy ignores the redirect and the end result is that 0 pages are scraped.
How do I tell scrapy that I need it to crawl the redirected url? I only need it to crawl the redirected url and not other urls that lead out of the website (like facebook pages etc.)
2016-11-27 14:48:42 [scrapy] INFO: Spider opened
2016-11-27 14:48:42 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2016-11-27 14:48:42 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
2016-11-27 14:48:44 [scrapy] DEBUG: Crawled (200) <GET http://www.mywebsite.com/> (referer: None)
2016-11-27 14:48:44 [scrapy] DEBUG: Filtered offsite request to 'www.mywebsite.freehost.net': <GET www.mywebsite.freehost.net>
2016-11-27 14:48:44 [scrapy] INFO: Closing spider (finished)
2016-11-27 14:48:44 [scrapy] INFO: Dumping Scrapy stats:
Upvotes: 1
Views: 208
Reputation: 18799
The logs show that your request is being filtered:
DEBUG: Filtered offsite request to 'www.mywebsite.freehost.net': <GET www.mywebsite.freehost.net>
Add that domain freehost.net
to your allowed_domains
list, or remove allowed_domains
from your spider to allow every domain.
Upvotes: 1