Robert Eduard Antal
Robert Eduard Antal

Reputation: 43

Multi threading script for HTTP status codes

Hi Stackoverflow community,

I would like to create a script that uses multi threading to create a high number of parallel requests for HTTP status codes on a large list of URL's (more than 30k vhosts).

The requests can be executed from the same server where the websites are hosted.

I was using multithreaded curl requests, but I'm not really satisfied with the results I've got. For a complete check of 30k hosts it takes more than an hour.

I am wondering if anyone has any tips or is there a more performant way to do it?

Upvotes: 1

Views: 786

Answers (2)

Robert Eduard Antal
Robert Eduard Antal

Reputation: 43

After testing some of the available solutions, the simplest and the fastest way was using webchk

webchk is a command-line tool developed in Python 3 for checking the HTTP status codes and response headers of URLs

The speed was impressive, the output was clean, it parsed 30k vhosts in about 2 minutes

https://webchk.readthedocs.io/en/latest/index.html

https://pypi.org/project/webchk/

Upvotes: 2

Charles Landau
Charles Landau

Reputation: 4275

If you're looking for parallelism and multi-threaded approaches to make HTTP requests with Python, then you might start with the aiohttp library, or use the popular requests package. Multithreading can be accomplished with multiprocessing, from the standard library.

Here's a discussion of rate limiting with aiohttp client: aiohttp: rate limiting parallel requests

Here's a discussion about making multiprocessing work with requests https://stackoverflow.com/a/27547938/10553976

Making it performant is a matter of your implementation. Be sure to profile your attempts and compare to your current implementation.

Upvotes: 0

Related Questions