Reputation: 591
i am practicing on a website with python and scrapy but it gives this error
DEBUG: Crawled (200) <GET http://careers.kfc.com.au/apply/?postcode=2000> (referer: None)
i can't understand why, it should work. The code is very short and i do not see any possible problems Here is the code
try:
import scrapy
except ImportError:
print "\nERROR IMPORTING THE NESSASARY LIBRARIES\n"
#File with all the links
#hellokitty = open('links.txt', 'r')
#making a list with all the links
#yourResult = [line.rstrip() for line in hellokitty.readlines()]
class SpiderMan(scrapy.Spider):
name = 'man spider'
#making start_urls equal to that list
start_urls = ['http://careers.kfc.com.au/apply/?postcode=2000']
def parse(self, response):
SET_SELECTOR = 'div.jobs-in-your-area.fixed-search.fixed ul.accordion li.accordion-item'
for attr in response.css(SET_SELECTOR):
suberbname = 'a.accordion-title.location-title ::text'
#ANOTHER FOR LOOP GOES HERE FOR THE INNER WORKINGS
for nextattr in attr.css('ul.accordion li.accordion-item'):
jobdestitle = 'a.accordion-title.job-title ::text'
jobdes = 'div[class=job-description] div[id=description] p ::text'
joblink = 'div[class=job-description] div[class=apply-now] a[class=button] ::attr(href)'
yield {
'SUBERB_NAME': attr.css(suberbname).extract_first(),
'JOBTITLE': nextattr.css(jobdestitle).extract_first(),
'JOB_DESCRIP': nextattr.css(jobdes).extract(),
'JOB_DESCRIP_LINK': nextattr.css(joblink).extract_first(),
}
and here is the log file
2017-04-30 14:15:02 [scrapy.utils.log] INFO: Scrapy 1.3.3 started (bot: scrapybot)
2017-04-30 14:15:02 [scrapy.utils.log] INFO: Overridden settings: {'SPIDER_LOADER_WARN_ONLY': True, 'LOG_FILE': 'kukur.txt'}
2017-04-30 14:15:02 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
2017-04-30 14:15:02 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2017-04-30 14:15:02 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2017-04-30 14:15:02 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2017-04-30 14:15:02 [scrapy.core.engine] INFO: Spider opened
2017-04-30 14:15:02 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2017-04-30 14:15:02 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2017-04-30 14:15:04 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://careers.kfc.com.au/apply/?postcode=2000> (referer: None)
2017-04-30 14:15:04 [scrapy.core.engine] INFO: Closing spider (finished)
2017-04-30 14:15:04 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 236,
'downloader/request_count': 1,
'downloader/request_method_count/GET': 1,
'downloader/response_bytes': 6478,
'downloader/response_count': 1,
'downloader/response_status_count/200': 1,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2017, 4, 30, 8, 45, 4, 704154),
'log_count/DEBUG': 2,
'log_count/INFO': 7,
'response_received_count': 1,
'scheduler/dequeued': 1,
'scheduler/dequeued/memory': 1,
'scheduler/enqueued': 1,
'scheduler/enqueued/memory': 1,
'start_time': datetime.datetime(2017, 4, 30, 8, 45, 2, 192149)}
2017-04-30 14:15:04 [scrapy.core.engine] INFO: Spider closed (finished)
Upvotes: 0
Views: 322
Reputation: 1947
There's is an issue with SET_SELECTOR
, jobdes
and joblink
declaration.
Here's the proper way to initialize it:
SET_SELECTOR = 'div.jobs-in-your-area'
jobdes = 'div.job-description div#description p ::text'
joblink = 'div.job-description div.apply-now a.button ::attr(href)'
Here's a run of your spider
in scrapy shell
:and a sample output
>>> # SET_SELECTOR modified
>>> SET_SELECTOR = 'div.jobs-in-your-area'
>>>
>>> for attr in response.css(SET_SELECTOR):
... suberbname = 'a.accordion-title.location-title ::text'
...
... for nextattr in attr.css('ul.accordion li.accordion-item'):
... jobdestitle = 'a.accordion-title.job-title ::text'
... # Jobdes and joblink modified
... jobdes = 'div.job-description div#description p ::text'
... joblink = 'div.job-description div.apply-now a.button ::attr(href)'
...
... print('SUBERB_NAME: ',attr.css(suberbname).extract_first())
... print('JOBTITLE: ', nextattr.css(jobdestitle).extract_first())
... print('JOB_DESCRIP: ', nextattr.css(jobdes).extract())
... print('JOB_DESCRIP_LINK: ', nextattr.css(joblink).extract_first())
...
SUBERB_NAME: Artarmon
JOBTITLE: Customer Service Team Member
JOB_DESCRIP: ['Company Information', 'KFC', " is the world's most popular chicken restaurant chain,\xa0specializing in our famous Original Recipe® fried chicken. It all started with one cook who created a finger lickin' good recipe more than ", '75', ' years ago, a list of secret herbs and spices scratched out on the back of the door to his kitchen. That cook was\xa0', 'Colonel Harland Sanders', ", of course, and today we still follow his formula for success, with real cooks breading and freshly preparing our delicious chicken by hand. Our aim is to put a smile on people's faces around the world and give every customer a special experience on each occasion. Our vision is that our jobs will be the best in the world for those committed to serving great food and looking after customers better than anyone else.", 'The Role', 'Customer Service Team Members are responsible for ensuring the provision of fresh, quality products, friendly and efficient service and maintaining clean and well-presented facilities for our valued customers!', 'Requirements/ key selection criteria', 'Experience', 'No experience necessary as full Training will be provided to all employees. Retail Traineeships are also available for employees who meet the required criteria.', 'Benefits:', "Working with KFC will give you financial independence, you'll receive recognition for your efforts and gain skills to set you on your career path. KFC is a place where good things happen as soon as you walk through the door.", 'Company Information', 'KFC', " is the world's most popular chicken restaurant chain,\xa0specializing in our famous Original Recipe® fried chicken. It all started with one cook who created a finger lickin' good recipe more than ", '75', ' years ago, a list of secret herbs and spices scratched out on the back of the door to his kitchen. That cook was\xa0', 'Colonel Harland Sanders', ", of course, and today we still follow his formula for success, with real cooks breading and freshly preparing our delicious chicken by hand. Our aim is to put a smile on people's faces around the world and give every customer a special experience on each occasion. Our vision is that our jobs will be the best in the world for those committed to serving great food and looking after customers better than anyone else.", 'The Role', 'Food Service Team Members consistently prepare high quality food products that create irresistible tastes for our customers whilst maintaining clean and well-presented facilities.', 'Requirements/ key selection criteria', 'Experience', 'No experience necessary as full training will be provided to all employees. Retail Traineeships are also available for employees who meet the required criteria.', 'Benefits:', "Working with KFC will give you financial independence, you'll receive recognition for your efforts and gain skills to set you on your career path. KFC is a place where good things happen as soon as you walk through the door."]
JOB_DESCRIP_LINK: http://applynow.net.au/jobs/KFC553-customer-service-team-member
Note: Use Developer tools and Scrapy Shell when
scraping
, makesdebugging
a lot faster.
Upvotes: 3