Reputation: 43

Scrapy: no item output | Debug: crawled (200)... (referer:none)

I am attempting to extract bid information from this site. I am a Scrapy newbie, and bit stuck as to why I don't getting any output, instead, I get Crawled (200)...(referer: None) and no output. I am unable to figure out what I am missing or need to change. I really don't know where the problem is. Can anyone please help figure this out?

Thank you!!

Here is my spider code:

from ..items import GovernmentItem
import scrapy, urllib.parse

class GeorgiaSpider(scrapy.Spider):
    name = 'georgia'
    allowed_domains = ['ssl.doas.state.ga.us']

    def start_requests(self):
        url = 'https://ssl.doas.state.ga.us/gpr/'

        yield scrapy.Request(url=url, callback=self.parse)

    def parse(self, response):
        for row in response.xpath('//*[@class="table table-striped table-bordered"]//tbody//tr'):
            item = GovernmentItem()

            item['description'] = row.xpath('./td[@class=" all"][2]').extract_first()
            item['begin_date'] = row.xpath('./td[@class=" desktop"]').extract_first()
            item['end_date'] = row.xpath('./td[@class="desktop tablet mobile sorting_1"]').extract_first()
            item['file_urls'] = row.xpath('./td[@class=" all]/a//@href').extract_first()

            yield item

Here is my my crawl log file:

2021-07-23 05:49:13 [scrapy.utils.log] INFO: Scrapy 2.5.0 started (bot: government)
    2021-07-23 05:49:13 [scrapy.utils.log] INFO: Versions: lxml 4.6.3.0, libxml2 2.9.10, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 21.2.0, Python 3.8.10 (default, Jun  2 2021, 10:49:15) - [GCC 9.4.0], pyOpenSSL 20.0.1 (OpenSSL 1.1.1k  25 Mar 2021), cryptography 3.4.7, Platform Linux-5.8.0-63-generic-x86_64-with-glibc2.29
    2021-07-23 05:49:13 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.epollreactor.EPollReactor
    2021-07-23 05:49:13 [scrapy.crawler] INFO: Overridden settings:
    {'BOT_NAME': 'government',
     'DOWNLOAD_DELAY': 1,
     'NEWSPIDER_MODULE': 'government.spiders',
     'SPIDER_MODULES': ['government.spiders']}
    2021-07-23 05:49:13 [scrapy.extensions.telnet] INFO: Telnet Password: 1196e88aa45a90c1
    2021-07-23 05:49:13 [scrapy.middleware] INFO: Enabled extensions:
    ['scrapy.extensions.corestats.CoreStats',
     'scrapy.extensions.telnet.TelnetConsole',
     'scrapy.extensions.memusage.MemoryUsage',
     'scrapy.extensions.feedexport.FeedExporter',
     'scrapy.extensions.logstats.LogStats']
    2021-07-23 05:49:13 [scrapy.middleware] INFO: Enabled downloader middlewares:
    ['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
     'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
     'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
     'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
     'scrapy.downloadermiddlewares.retry.RetryMiddleware',
     'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
     'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
     'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
     'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
     'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
     'scrapy.downloadermiddlewares.stats.DownloaderStats']
    2021-07-23 05:49:13 [scrapy.middleware] INFO: Enabled spider middlewares:
    ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
     'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
     'scrapy.spidermiddlewares.referer.RefererMiddleware',
     'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
     'scrapy.spidermiddlewares.depth.DepthMiddleware']
    2021-07-23 05:49:13 [scrapy.middleware] INFO: Enabled item pipelines:
    ['government.pipelines.GovernmentPipeline',
     'scrapy.pipelines.files.FilesPipeline']
    2021-07-23 05:49:13 [scrapy.core.engine] INFO: Spider opened
    2021-07-23 05:49:13 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
    2021-07-23 05:49:13 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
    2021-07-23 05:49:14 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://ssl.doas.state.ga.us/gpr/unsupported?browser=> from <GET https://ssl.doas.state.ga.us/gpr/>
    2021-07-23 05:49:15 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://ssl.doas.state.ga.us/gpr/unsupported?browser=> (referer: None)
    2021-07-23 05:49:15 [scrapy.core.engine] INFO: Closing spider (finished)
    2021-07-23 05:49:15 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
    {'downloader/request_bytes': 468,
     'downloader/request_count': 2,
     'downloader/request_method_count/GET': 2,
     'downloader/response_bytes': 6169,
     'downloader/response_count': 2,
     'downloader/response_status_count/200': 1,
     'downloader/response_status_count/302': 1,
     'elapsed_time_seconds': 1.564505,
     'finish_reason': 'finished',
     'finish_time': datetime.datetime(2021, 7, 23, 10, 49, 15, 561300),
     'log_count/DEBUG': 2,
     'log_count/INFO': 10,
     'memusage/max': 55824384,
     'memusage/startup': 55824384,
     'response_received_count': 1,
     'scheduler/dequeued': 2,
     'scheduler/dequeued/memory': 2,
     'scheduler/enqueued': 2,
     'scheduler/enqueued/memory': 2,
     'start_time': datetime.datetime(2021, 7, 23, 10, 49, 13, 996795)}
    2021-07-23 05:49:15 [scrapy.core.engine] INFO: Spider closed (finished)

Upvotes: 0

Answers (3)

Shivam

Reputation: 620

As mentioned by SuperUser, your original URL is getting redirected because website expects the request from real browser. To mimic the same behaviour like browser through scrapy you should pass user-agent either by setting.py or as a header in your spider.py file which would return you page source html.

Your XPath still won't work because the data you are looking for is generated dynamically. So, you should reproduce the request using browser dev tools to get the API and then utilise that to get the desired results.

You will get the JSON response from below code. For the demonstration, I have extracted only one field. Similarly, you can get other fields.

Code

import scrapy
import json
from ..items import GovernmentItem

class Test(scrapy.Spider):
    name = 'test'

    headers = {
        "authority": "ssl.doas.state.ga.us",
        "pragma": "no-cache",
        "cache-control": "no-cache",
        "sec-ch-ua": "\"Chromium\";v=\"92\", \" Not A;Brand\";v=\"99\", \"Google Chrome\";v=\"92\"",
        "accept": "application/json, text/javascript, */*; q=0.01",
        "x-requested-with": "XMLHttpRequest",
        "sec-ch-ua-mobile": "?0",
        "user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36",
        "content-type": "application/x-www-form-urlencoded; charset=UTF-8",
        "origin": "https://ssl.doas.state.ga.us",
        "sec-fetch-site": "same-origin",
        "sec-fetch-mode": "cors",
        "sec-fetch-dest": "empty",
        "referer": "https://ssl.doas.state.ga.us/gpr/",
        "accept-language": "en-US,en;q=0.9"
    }

    body = 'draw=1&columns%5B0%5D%5Bdata%5D=function&columns%5B0%5D%5Bname%5D=&columns%5B0%5D%5Bsearchable%5D=true&columns%5B0%5D%5Borderable%5D=false&columns%5B0%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B0%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B1%5D%5Bdata%5D=function&columns%5B1%5D%5Bname%5D=&columns%5B1%5D%5Bsearchable%5D=true&columns%5B1%5D%5Borderable%5D=true&columns%5B1%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B1%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B2%5D%5Bdata%5D=title&columns%5B2%5D%5Bname%5D=&columns%5B2%5D%5Bsearchable%5D=true&columns%5B2%5D%5Borderable%5D=true&columns%5B2%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B2%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B3%5D%5Bdata%5D=agencyName&columns%5B3%5D%5Bname%5D=&columns%5B3%5D%5Bsearchable%5D=true&columns%5B3%5D%5Borderable%5D=true&columns%5B3%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B3%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B4%5D%5Bdata%5D=postingDateStr&columns%5B4%5D%5Bname%5D=&columns%5B4%5D%5Bsearchable%5D=true&columns%5B4%5D%5Borderable%5D=true&columns%5B4%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B4%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B5%5D%5Bdata%5D=closingDateStr&columns%5B5%5D%5Bname%5D=&columns%5B5%5D%5Bsearchable%5D=true&columns%5B5%5D%5Borderable%5D=true&columns%5B5%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B5%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B6%5D%5Bdata%5D=function&columns%5B6%5D%5Bname%5D=&columns%5B6%5D%5Bsearchable%5D=true&columns%5B6%5D%5Borderable%5D=false&columns%5B6%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B6%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B7%5D%5Bdata%5D=status&columns%5B7%5D%5Bname%5D=&columns%5B7%5D%5Bsearchable%5D=true&columns%5B7%5D%5Borderable%5D=false&columns%5B7%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B7%5D%5Bsearch%5D%5Bregex%5D=false&order%5B0%5D%5Bcolumn%5D=5&order%5B0%5D%5Bdir%5D=asc&start=0&length=50&search%5Bvalue%5D=&search%5Bregex%5D=false&responseType=ALL&eventStatus=OPEN&eventIdTitle=&govType=ALL&govEntity=&eventProcessType=ALL&dateRangeType=&rangeStartDate=&rangeEndDate=&isReset=false&persisted=&refreshSearchData=false'

    def start_requests(self):
       url = 'https://ssl.doas.state.ga.us/gpr/eventSearch'
       yield scrapy.Request(url=url,method='POST', headers=self.headers,body=self.body, callback=self.parse)

    def parse(self,response):
        item = GovernmentItem()
        response = json.loads(response.body)
        for i in response.get('data'):
            item['title'] = i.get('title')
            yield item

Upvotes: 0

Md. Fazlul Hoque

Reputation: 16187

This is the full working solution:

 import scrapy
 import json
    # base_url = https://ssl.doas.state.ga.us/gpr/
    
    class GeorgiaSpider(scrapy.Spider):
    
        name = 'georgia'
        body = 'draw=1&columns%5B0%5D%5Bdata%5D=function&columns%5B0%5D%5Bname%5D=&columns%5B0%5D%5Bsearchable%5D=true&columns%5B0%5D%5Borderable%5D=false&columns%5B0%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B0%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B1%5D%5Bdata%5D=function&columns%5B1%5D%5Bname%5D=&columns%5B1%5D%5Bsearchable%5D=true&columns%5B1%5D%5Borderable%5D=true&columns%5B1%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B1%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B2%5D%5Bdata%5D=title&columns%5B2%5D%5Bname%5D=&columns%5B2%5D%5Bsearchable%5D=true&columns%5B2%5D%5Borderable%5D=true&columns%5B2%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B2%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B3%5D%5Bdata%5D=agencyName&columns%5B3%5D%5Bname%5D=&columns%5B3%5D%5Bsearchable%5D=true&columns%5B3%5D%5Borderable%5D=true&columns%5B3%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B3%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B4%5D%5Bdata%5D=postingDateStr&columns%5B4%5D%5Bname%5D=&columns%5B4%5D%5Bsearchable%5D=true&columns%5B4%5D%5Borderable%5D=true&columns%5B4%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B4%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B5%5D%5Bdata%5D=closingDateStr&columns%5B5%5D%5Bname%5D=&columns%5B5%5D%5Bsearchable%5D=true&columns%5B5%5D%5Borderable%5D=true&columns%5B5%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B5%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B6%5D%5Bdata%5D=function&columns%5B6%5D%5Bname%5D=&columns%5B6%5D%5Bsearchable%5D=true&columns%5B6%5D%5Borderable%5D=false&columns%5B6%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B6%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B7%5D%5Bdata%5D=status&columns%5B7%5D%5Bname%5D=&columns%5B7%5D%5Bsearchable%5D=true&columns%5B7%5D%5Borderable%5D=false&columns%5B7%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B7%5D%5Bsearch%5D%5Bregex%5D=false&order%5B0%5D%5Bcolumn%5D=5&order%5B0%5D%5Bdir%5D=asc&start=0&length=50&search%5Bvalue%5D=&search%5Bregex%5D=false&responseType=ALL&eventStatus=OPEN&eventIdTitle=&govType=ALL&govEntity=&eventProcessType=ALL&dateRangeType=&rangeStartDate=&rangeEndDate=&isReset=false&persisted=&refreshSearchData=false'
    
        def start_requests(self):
            yield scrapy.Request(
                url='https://ssl.doas.state.ga.us/gpr/eventSearch',
                callback=self.parse,
                body=self.body,
                method="POST",
                headers={
                    'authority': 'ssl.doas.state.ga.us',
                    'path': '/gpr/eventSearch',
                    'scheme': 'https',
                    'accept': 'application/json, text/javascript, */*; q=0.01',
                    'accept-encoding': 'gzip, deflate, br',
                    'accept-language': 'en-US,en;q=0.9,bn;q=0.8,es;q=0.7,ar;q=0.6',
                    'content-length': '2030',
                    'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
                    'origin': 'https://ssl.doas.state.ga.us',
                    'referer': 'https://ssl.doas.state.ga.us/gpr/',
                    'sec-ch-ua': '"Chromium";v="92", " Not A;Brand";v="99", "Google Chrome";v="92"',
                    'sec-ch-ua-mobile': '?0',
                    'sec-fetch-dest': 'empty',
                    'sec-fetch-mode': 'cors',
                    'pragma': 'no-cache',
                    'cache-control': 'no-cache',
                    'sec-fetch-site': 'same-origin',
                    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36',
                    'x-requested-with': 'XMLHttpRequest'
                    }
                )
    
        def parse(self, response):
            response = json.loads(response.body)
            for resp in response['data']:
                yield {
                    'title': resp['title'],
                    'begin_date':resp['postingDateStr'],
                    'end_date':resp['closingDateStr']
                    }
              

                 OUTPUT:

2021-07-25 10:48:01 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2021-07-25 10:48:04 [scrapy.core.engine] DEBUG: Crawled (200) <POST https://ssl.doas.state.ga.us/gpr/eventSearch> (referer: https://ssl.doas.state.ga.us/gpr/)
2021-07-25 10:48:04 [scrapy.core.scraper] DEBUG: Scraped from <200 https://ssl.doas.state.ga.us/gpr/eventSearch>
{'title': 'Ford Transit Connect Cargo Van', 'begin_date': 'Jul 12, 2021 @ 05:19 PM', 'end_date': 'Jul 26, 2021 @ 09:00 AM'}
2021-07-25 10:48:04 [scrapy.core.scraper] DEBUG: Scraped from <200 https://ssl.doas.state.ga.us/gpr/eventSearch>
{'title': '2020 CDBG Water System Improvement', 'begin_date': 'Jun 14, 2021 @ 11:58 AM', 'end_date': 'Jul 26, 2021 @ 10:00 AM'}
2021-07-25 10:48:04 [scrapy.core.scraper] DEBUG: Scraped from <200 https://ssl.doas.state.ga.us/gpr/eventSearch>    
{'title': 'Fire Station 20 Renovations', 'begin_date': 'Jun 30, 2021 @ 08:54 AM', 'end_date': 'Jul 26, 2021 @ 10:00 
AM'}
2021-07-25 10:48:04 [scrapy.core.scraper] DEBUG: Scraped from <200 https://ssl.doas.state.ga.us/gpr/eventSearch>    
{'title': 'Food Service Produce', 'begin_date': 'Jun 28, 2021 @ 07:24 AM', 'end_date': 'Jul 26, 2021 @ 10:00 AM'}   
2021-07-25 10:48:04 [scrapy.core.scraper] DEBUG: Scraped from <200 https://ssl.doas.state.ga.us/gpr/eventSearch>    
{'title': 'LMIG 2021 ROAD RESURFACING PORJECT', 'begin_date': 'Jul 01, 2021 @ 02:36 PM', 'end_date': 'Jul 26, 2021 @ 10:00 AM'}
2021-07-25 10:48:04 [scrapy.core.scraper] DEBUG: Scraped from <200 https://ssl.doas.state.ga.us/gpr/eventSearch>    
{'title': 'MGA Robinson R44 Helicopter Cadet Overhaul', 'begin_date': 'Jul 07, 2021 @ 12:00 PM', 'end_date': 'Jul 26, 2021 @ 10:00 AM'}
2021-07-25 10:48:04 [scrapy.core.scraper] DEBUG: Scraped from <200 https://ssl.doas.state.ga.us/gpr/eventSearch>    
{'title': 'Renovations to Old Hickory Flat Gymnasium', 'begin_date': 'Jun 24, 2021 @ 04:51 PM', 'end_date': 'Jul 26, 2021 @ 10:00 AM'}
2021-07-25 10:48:04 [scrapy.core.scraper] DEBUG: Scraped from <200 https://ssl.doas.state.ga.us/gpr/eventSearch>    
{'title': 'eCard Services', 'begin_date': 'Jul 09, 2021 @ 12:00 PM', 'end_date': 'Jul 26, 2021 @ 12:00 PM'}
2021-07-25 10:48:04 [scrapy.core.scraper] DEBUG: Scraped from <200 https://ssl.doas.state.ga.us/gpr/eventSearch>    
{'title': 'Ford Police Interceptor', 'begin_date': 'Jul 01, 2021 @ 12:17 PM', 'end_date': 'Jul 26, 2021 @ 02:00 PM'}2021-07-25 10:48:04 [scrapy.core.scraper] DEBUG: Scraped from <200 https://ssl.doas.state.ga.us/gpr/eventSearch>    
{'title': 'North End Roadway Safety Analysis', 'begin_date': 'Jun 23, 2021 @ 10:18 AM', 'end_date': 'Jul 26, 2021 @ 
02:00 PM'}
2021-07-25 10:48:04 [scrapy.core.scraper] DEBUG: Scraped from <200 https://ssl.doas.state.ga.us/gpr/eventSearch>    
{'title': 'Offset Printing & Finishing Services for Bulldog Print + Design', 'begin_date': 'Jun 30, 2021 @ 03:15 PM', 'end_date': 'Jul 26, 2021 @ 02:00 PM'}
2021-07-25 10:48:04 [scrapy.core.scraper] DEBUG: Scraped from <200 https://ssl.doas.state.ga.us/gpr/eventSearch>
{'title': 'RECREATION, PARKS, HISTORIC AND CULTURAL AFFAIRS 5', 'begin_date': 'Jun 25, 2021 @ 10:37 AM', 'end_date': 'Jul 26, 2021 @ 02:00 PM'}
2021-07-25 10:48:04 [scrapy.core.scraper] DEBUG: Scraped from <200 https://ssl.doas.state.ga.us/gpr/eventSearch>    
{'title': 'Surgical Sterilization of Adoptable Pets', 'begin_date': 'Jun 24, 2021 @ 03:12 PM', 'end_date': 'Jul 26, 
2021 @ 03:00 PM'}
2021-07-25 10:48:04 [scrapy.core.scraper] DEBUG: Scraped from <200 https://ssl.doas.state.ga.us/gpr/eventSearch>    
{'title': 'Remount Ambulance', 'begin_date': 'Jun 23, 2021 @ 10:43 AM', 'end_date': 'Jul 26, 2021 @ 04:00 PM'}      
2021-07-25 10:48:04 [scrapy.core.scraper] DEBUG: Scraped from <200 https://ssl.doas.state.ga.us/gpr/eventSearch>    
{'title': 'DOCO CDBG Rehab/Elevation/Reconstruction', 'begin_date': 'Jun 23, 2021 @ 11:12 AM', 'end_date': 'Jul 26, 
2021 @ 05:00 PM'}
2021-07-25 10:48:04 [scrapy.core.scraper] DEBUG: Scraped from <200 https://ssl.doas.state.ga.us/gpr/eventSearch>    
{'title': 'T32-D1-Veg Rem-SR 211 Barrow-121766', 'begin_date': 'Jul 08, 2021 @ 04:58 PM', 'end_date': 'Jul 26, 2021 
@ 05:00 PM'}
2021-07-25 10:48:04 [scrapy.core.scraper] DEBUG: Scraped from <200 https://ssl.doas.state.ga.us/gpr/eventSearch>    
{'title': 'Carrollton City Hall Renovation & Addition', 'begin_date': 'Jun 25, 2021 @ 09:14 AM', 'end_date': 'Jul 27, 2021 @ 10:00 AM'}
2021-07-25 10:48:04 [scrapy.core.scraper] DEBUG: Scraped from <200 https://ssl.doas.state.ga.us/gpr/eventSearch>    
{'title': 'HUB Transformation Project', 'begin_date': 'Jun 17, 2021 @ 05:01 PM', 'end_date': 'Jul 27, 2021 @ 10:00 AM'}
2021-07-25 10:48:04 [scrapy.core.scraper] DEBUG: Scraped from <200 https://ssl.doas.state.ga.us/gpr/eventSearch>    
{'title': 'Extrication Equipment', 'begin_date': 'Jun 22, 2021 @ 12:08 AM', 'end_date': 'Jul 27, 2021 @ 11:00 AM'}  
2021-07-25 10:48:04 [scrapy.core.scraper] DEBUG: Scraped from <200 https://ssl.doas.state.ga.us/gpr/eventSearch>    
{'title': 'Goodyear Pressure Improvement Natural Gas Main Ext', 'begin_date': 'Jul 01, 2021 @ 02:46 PM', 'end_date': 'Jul 27, 2021 @ 11:00 AM'}
2021-07-25 10:48:04 [scrapy.core.scraper] DEBUG: Scraped from <200 https://ssl.doas.state.ga.us/gpr/eventSearch>    
{'title': 'Septic Tank and Grease Trap Pumping', 'begin_date': 'Jun 29, 2021 @ 05:53 PM', 'end_date': 'Jul 27, 2021 
@ 11:00 AM'}
2021-07-25 10:48:04 [scrapy.core.scraper] DEBUG: Scraped from <200 https://ssl.doas.state.ga.us/gpr/eventSearch>    
{'title': 'T-Hangar Area-Sitework', 'begin_date': 'Jun 29, 2021 @ 08:27 AM', 'end_date': 'Jul 27, 2021 @ 11:00 AM'} 
2021-07-25 10:48:04 [scrapy.core.scraper] DEBUG: Scraped from <200 https://ssl.doas.state.ga.us/gpr/eventSearch>    
{'title': 'Toccoa Airport - Apron / Ramp Seal Coat', 'begin_date': 'Jun 30, 2021 @ 04:03 PM', 'end_date': 'Jul 27, 2021 @ 11:00 AM'}
2021-07-25 10:48:04 [scrapy.core.scraper] DEBUG: Scraped from <200 https://ssl.doas.state.ga.us/gpr/eventSearch>
{'title': 'UWG Nursing Building Parking Lot Expansion', 'begin_date': 'Jun 25, 2021 @ 09:08 AM', 'end_date': 'Jul 27, 2021 @ 11:00 AM'}
2021-07-25 10:48:04 [scrapy.core.scraper] DEBUG: Scraped from <200 https://ssl.doas.state.ga.us/gpr/eventSearch>    
{'title': 'WYCKOFF RAW WATER PIPELINE REPLACEMENT', 'begin_date': 'Jun 25, 2021 @ 09:11 AM', 'end_date': 'Jul 27, 2021 @ 11:00 AM'}
2021-07-25 10:48:04 [scrapy.core.scraper] DEBUG: Scraped from <200 https://ssl.doas.state.ga.us/gpr/eventSearch>    
{'title': 'Purchase 1 One Ton 4x2 Extended Cab Truck', 'begin_date': 'Jul 09, 2021 @ 09:22 AM', 'end_date': 'Jul 27, 2021 @ 12:00 PM'}
2021-07-25 10:48:04 [scrapy.core.scraper] DEBUG: Scraped from <200 https://ssl.doas.state.ga.us/gpr/eventSearch>    
{'title': 'Purchase One 4x2 Two Ton Crew Cab Truck', 'begin_date': 'Jul 09, 2021 @ 09:33 AM', 'end_date': 'Jul 27, 2021 @ 12:00 PM'}
2021-07-25 10:48:04 [scrapy.core.scraper] DEBUG: Scraped from <200 https://ssl.doas.state.ga.us/gpr/eventSearch>    
{'title': 'Purchase One 4x4 Crew Cab Two Ton Truck', 'begin_date': 'Jul 09, 2021 @ 09:28 AM', 'end_date': 'Jul 27, 2021 @ 12:00 PM'}

Upvotes: 2

SuperUser

Reputation: 4822

As you can see you get the response of https://ssl.doas.state.ga.us/gpr/unsupported?browser=, so set your user_agent accordingly (for example a windows machine with chrome browser).

Change (and uncomment) USER_AGENT in settings.py to:

USER_AGENT="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"

Upvotes: 1

Scrapy: no item output | Debug: crawled (200)... (referer:none)

Answers (3)

Related Questions