Reputation: 135
I'm newbie to the python scrapy. When I push the 'scrapy crawl name' command, the cmd window does something very busily. But finally, it doesn't spit out any HTML files. There's seems lots of questions about scrapy not working, but couldn't find one like this case. So I post this question.
This is my codes.
import scrapy
class PostsSpider(scrapy.Spider):
name = "posts"
start_urls = [
'https://blog.scrapinghub.com/page/1/',
'https://blog.scrapinghub.com/page/2/'
]
def parse(self, response):
page = reponse.url.split('/')[-1]
filename = 'posts-%s.html' % page
with open(filename, 'wb') as f:
f.write(response.body)
I went in to 'cd postscrape' where all these files and venv are layed. And activated the venv by 'call venv\Scripts\activate.bat'. And finally went 'scrapy crawl posts' on the cmd, in which venv was activated. As you see, if I go like this, this code should spit out two HTML files 'posts-1.html' and 'posts-2.html'. Actually the command doesn't return any error message and seems to do somethings busily. But finally, it returns nothing. What's the problem??
Thank you genius!
Upvotes: 1
Views: 157
Reputation: 106
You missed one letter 's' in the 'response'.
page = reponse.url.split('/')[-1]
-->
page = response.url.split('/')[-1]
Upvotes: 1
Reputation: 2678
There is no need to manually write items to file. You can simply yield items and provide flag -o
as follows:
scrapy crawl some_spider -o some_file_name.json
More you can check in the documentation.
Upvotes: 1