Retrieving crawled urls Scrapy

Question

I've built a crawler to crawl a particular website using Scrapy. The crawler follows if a url matches the given regex and calls the callback function if url matches other defined regex. The main purpose to build the crawler was to extract all the required links within the website rather than the contents inside the link. Can anyone tell me how to print the list of all the crawled links. The code is:

name = "xyz"
allowed_domains = ["xyz.com"]
start_urls = ["http://www.xyz.com/Vacanciess"] 
rules = (Rule(SgmlLinkExtractor(allow=[regex2]),callback='parse_item'),Rule(SgmlLinkExtractor(allow=[regex1]), follow=True),)



def parse_item(self, response):
 #sel = Selector(response)

 #title = sel.xpath("//h1[@class='no-bd']/text()").extract()
 #print title
 print response

The

print title

code works perfectly well. But as in the above code if i try t print the actual response, it returns me:

[xyz] DEBUG: Crawled (200) (referer:  http://www.xyz.com/urlmatchingregex1)
<200 http://www.xyz.com/urlmatchingregex2>

Anyone please help me to retrieve the actual url.

shaktimaan · Accepted Answer

You can print response.url in parse_item method to print the url crawled. It is documented here.

Retrieving crawled urls Scrapy

Answers (1)

Related Questions