sulav_lfc
sulav_lfc

Reputation: 782

Retrieving crawled urls Scrapy

I've built a crawler to crawl a particular website using Scrapy. The crawler follows if a url matches the given regex and calls the callback function if url matches other defined regex. The main purpose to build the crawler was to extract all the required links within the website rather than the contents inside the link. Can anyone tell me how to print the list of all the crawled links. The code is:

name = "xyz"
allowed_domains = ["xyz.com"]
start_urls = ["http://www.xyz.com/Vacanciess"] 
rules = (Rule(SgmlLinkExtractor(allow=[regex2]),callback='parse_item'),Rule(SgmlLinkExtractor(allow=[regex1]), follow=True),)



def parse_item(self, response):
 #sel = Selector(response)

 #title = sel.xpath("//h1[@class='no-bd']/text()").extract()
 #print title
 print response

The

print title 

code works perfectly well. But as in the above code if i try t print the actual response, it returns me:

[xyz] DEBUG: Crawled (200)<GET http://www.xyz.com/urlmatchingregex2> (referer:  http://www.xyz.com/urlmatchingregex1)
<200 http://www.xyz.com/urlmatchingregex2>

Anyone please help me to retrieve the actual url.

Upvotes: 0

Views: 395

Answers (1)

shaktimaan
shaktimaan

Reputation: 12092

You can print response.url in parse_item method to print the url crawled. It is documented here.

Upvotes: 1

Related Questions