DHANANJAY CHAUBEY
DHANANJAY CHAUBEY

Reputation: 117

Run Parallel Request session in python

I am trying to open a multiple web session and save the data into CSV, Have written my code using for loop & requests.get options, But it's taking so long to access 90 number of Web location. Can anyone let me know how the whole process run in parallel for loc_var:

The code is working fine, only the issue is running one by one for loc_var, and took so long time.

Want to access all the for loop loc_var URL in parallel and write operation of CSV

Below is the Code:

import pandas as pd
import numpy as np
import os
import requests
import datetime
import zipfile
t=datetime.date.today()-datetime.timedelta(2)
server = [("A","web1",":5000","username=usr&password=p7Tdfr")]
'''List of all web_ips'''
web_1 = ["Web1","Web2","Web3","Web4","Web5","Web6","Web7","Web8","Web9","Web10","Web11","Web12","Web13","Web14","Web15"]
'''List of All location'''
loc_var =["post1","post2","post3","post4","post5","post6","post7","post8","post9","post10","post11","post12","post13","post14","post15","post16","post17","post18"]

for s,web,port,usr in server:
    login_url='http://'+web+port+'/api/v1/system/login/?'+usr
    print (login_url)
    s= requests.session()
    login_response = s.post(login_url)
    print("login Responce",login_response)
    #Start access the Web for Loc_variable
    for mkt in loc_var:
        #output is CSV File
        com_actions_url='http://'+web+port+'/api/v1/3E+date(%5C%22'+str(t)+'%5C%22)and+location+%3D%3D+%27'+mkt+'%27%22&page_size=-1&format=%22csv%22'
        print("com_action_url",com_actions_url)
        r = s.get(com_actions_url)
        print("action",r)
        if r.ok == True:            
            with open(os.path.join("/home/Reports_DC/", "relation_%s.csv"%mkt),'wb') as f:
                f.write(r.content)  

        # If loc is not aceesble try with another Web_1 List
        if r.ok == False:
            while r.ok == False:
                for web_2 in web_1:
                    login_url='http://'+web_2+port+'/api/v1/system/login/?'+usr
                    com_actions_url='http://'+web_2+port+'/api/v1/3E+date(%5C%22'+str(t)+'%5C%22)and+location+%3D%3D+%27'+mkt+'%27%22&page_size=-1&format=%22csv%22'
                    login_response = s.post(login_url)
                    print("login Responce",login_response)
                    print("com_action_url",com_actions_url)
                    r = s.get(com_actions_url)
                    if r.ok == True:            
                        with open(os.path.join("/home/Reports_DC/", "relation_%s.csv"%mkt),'wb') as f:
                            f.write(r.content)  
                        break

Upvotes: 6

Views: 13024

Answers (1)

J. Taylor
J. Taylor

Reputation: 4855

There are multiple approaches that you can take to make concurrent HTTP requests. Two that I've used are (1) multiple threads with concurrent.futures.ThreadPoolExecutor or (2) send the requests asynchronously using asyncio/aiohttp.

To use a thread pool to send your requests in parallel, you would first generate a list of URLs that you want to fetch in parallel (in your case generate a list of login_urls and com_action_urls), and then you would request all of the URLs concurrently as follows:

from concurrent.futures import ThreadPoolExecutor
import requests

def fetch(url):
    page = requests.get(url)
    return page.text
    # Catch HTTP errors/exceptions here

pool = ThreadPoolExecutor(max_workers=5)

urls = ['http://www.google.com', 'http://www.yahoo.com', 'http://www.bing.com']  # Create a list of urls

for page in pool.map(fetch, urls):
    # Do whatever you want with the results ...
    print(page[0:100])

Using asyncio/aiohttp is generally faster than the threaded approach above, but the learning curve is more complicated. Here is a simple example (Python 3.7+):

import asyncio
import aiohttp

urls = ['http://www.google.com', 'http://www.yahoo.com', 'http://www.bing.com']

async def fetch(session, url):
    async with session.get(url) as resp:
        return await resp.text()
        # Catch HTTP errors/exceptions here

async def fetch_concurrent(urls):
    loop = asyncio.get_event_loop()
    async with aiohttp.ClientSession() as session:
        tasks = []
        for u in urls:
            tasks.append(loop.create_task(fetch(session, u)))

        for result in asyncio.as_completed(tasks):
            page = await result
            #Do whatever you want with results
            print(page[0:100])

asyncio.run(fetch_concurrent(urls))

But unless you are going to be making a huge number of requests, the threaded approach will likely be sufficient (and way easier to implement).

Upvotes: 21

Related Questions