Reputation: 514
I want to get data from multiples pages about 10000 pages with number arrays. But one by one is taking so long and I'm new in Python so I don't know much about multithreading and asychronism in this language
The code works fine, it takes all the data expected, but it takes several minutes to do this. And I know that it could probably be done faster if I'd do more than a request per time
import http.client
import json
def get_all_data():
connection = http.client.HTTPConnection("localhost:5000")
page = 1
data = {}
while True:
try:
api_url = f'/api/numbers?page={page}'
connection.request('GET', api_url)
response = connection.getresponse()
if(response.status is 200):
data[f'{page}'] = json.loads(response.read())['numbers']
items_returned = len(data[f'{page}'])
print(f'Por Favor, Aguarde. Obtendo os Dados... Request: {page} -- Itens Retornados: {items_returned}')
page += 1
if items_returned == 0 or items_returned == None :
break
except:
connection.close()
print('Todas as Requisições Concluídas!')
return data
How to refactor this code to do multiple requests at once sequentially instead one by one?
Upvotes: 3
Views: 1313
Reputation: 3337
Basically there are three ways of doing this kind of job, multithreading, multiprocessing, and async way, as mentioned by ACE the page parameter exists because of server dynamically generate template and number of pages may change over time due to the database update. the easiest way of doing this can be batch job, and try to put each batch into a try exception block, and handling the last part(not enough for one batch) separately. you can set the numer of jobs in each batch as a variable and try different solutions.
Upvotes: 1
Reputation: 482
Your parameter page (producer) is dynamic and it relies on the last request (consumer). Unless you can separate the producer, you can't use coroutines or multithreading.
Upvotes: 0