Reputation: 19253
I am trying to measure the throughput of the system in scrapy and I am trying to find when the HTTP request was fired and when it was completed completed in scrapy.
Any directions to find a solution is highly appreciated.
Upvotes: 1
Views: 238
Reputation: 21406
You could use custom middleware:
class MeasureMiddleware:
requests = []
def process_request(self, request, spider):
# store the time and url of every outgoing request
self.requests.append((request.url, datetime.now()))
def process_response(self, request, response, spider):
# for everyone response check if one of tracked requests cameback
# if so, print start time and current time
filtered_requests = []
# go through tracked requests and check whether any of them match current url
for request in self.requests:
url, start_date = request
if url == request.url:
logging.info(f'request {url} {start_date} - {datetime.now()}')
else:
filtered_requests.append(request)
self.requests = filtered_requests
Then activate the downloader middleware
DOWNLOADER_MIDDLEWARES = {
'myproject.middlewares.MeasureMiddleware': 543,
}
It's worth noting that due to async nature of scrapy it won't be ms accurate but it should be accurate enough to give a generic overview.
Upvotes: 1