Reputation: 3
I am trying to scrape "https://www.expireddomains.net/deleted-com-domains/"
for the expired domain data list.
I always get empty item fields for the following
class ExpiredSpider(BaseSpider):
name = "expired"
allowed_domains = ["example.com"]
start_urls = ['https://www.expireddomains.net/deleted-com-domains/']
def parse(self, response):
log.msg('parse(%s)' % response.url, level = log.DEBUG)
rows = response.xpath('//table[@class="base1"]/tbody/tr')
for row in rows:
item = DomainItem()
item['domain'] = row.xpath('td[1]/text()').extract()
item['bl'] = row.xpath('td[2]/text()').extract()
yield item
Can somebody point out what is wrong? Thanks.
Upvotes: 0
Views: 339
Reputation: 5240
As a first note, you should use scrapy.Spider
instead of BaseSpider which is deprecated
Secondly, .extract()
method returns a list rather than a single element.
This is how the item extraction should look like
item['domain'] = row.xpath('td[1]/text()').extract_first()
item['bl'] = row.xpath('td[2]/text()').extract_first()
Also,
You should use the built in python logging
library
import logging
logging.debug("parse("+response.url+")")
Upvotes: 0