Reputation: 807
I am very new in Python and Scrapy and I have written a crawler in PyCharm as follow:
import scrapy
from scrapy.spiders import Spider
from scrapy.http import Request
import re
class TutsplusItem(scrapy.Item):
title = scrapy.Field()
class MySpider(Spider):
name = "tutsplus"
allowed_domains = ["bbc.com"]
start_urls = ["http://www.bbc.com/"]
def parse(self, response):
links = response.xpath('//a/@href').extract()
# We stored already crawled links in this list
crawledLinks = []
for link in links:
# If it is a proper link and is not checked yet, yield it to the Spider
#if linkPattern.match(link) and not link in crawledLinks:
if not link in crawledLinks:
link = "http://www.bbc.com" + link
crawledLinks.append(link)
yield Request(link, self.parse)
titles = response.xpath('//a[contains(@class, "media__link")]/text()').extract()
for title in titles:
item = TutsplusItem()
item["title"] = title
print("Title is : %s" %title)
yield item
However, when I run above codes, nothing prints on the screen! What is wrong in my code?
Upvotes: 1
Views: 921
Reputation: 668
To run a spider from within Pycharm you need to configure "Run/Debug configuration" properly. Running your_spider.py
as a standalone script wouldn't result in anything.
As mentioned by @stranac scrapy crawl
is the way to go. With scrapy
being a binary and crawl
an argument of your binary.
Configure Run/Debug
In the main menu go to : Run > Run Configurations...
scrapy
will execute. In your case, you wan to start your spider. this is how this should look like:crawl your_spider_name
e.g. crawl tutsplus
Make sure that the Python intrepreter is the one where you setup Scrapy
and other packages needed for your project.
Make sure that the working directory is the directory containing settings.py
which is also generated by Scrapy
.
From now on you should be able to Run and Debug your spiders from within Pycharm.
Upvotes: 0
Reputation: 12158
Put the code in a text file, name it to something like your_spider.py
and run the spider using the runspider
command:
scrapy runspider your_spider.py
Upvotes: 1
Reputation: 28206
You would typically start scrapy using scrapy crawl, which will hook everything up for you and start the crawling.
It also looks like your code is not properly indented (only one line inside parse
when they all should be).
Upvotes: 0