Rafael
Rafael

Reputation: 3196

Maintaining Dictionary Integrity While Running it Through Multithread Process

I sped up a process by using a multithread function, however I need to maintain a relationship between the output and input.

import requests
import pprint
import threading

ticker = ['aapl', 'googl', 'nvda']
url_array = []

for i in ticker:
    url = 'https://query2.finance.yahoo.com/v10/finance/quoteSummary/' + i + '?formatted=true&crumb=8ldhetOu7RJ&lang=en-US&region=US&modules=defaultKeyStatistics%2CfinancialData%2CcalendarEvents&corsDomain=finance.yahoo.com'
    url_array.append(url)


def fetch_ev(url):
    urlHandler = requests.get(url)
    data = urlHandler.json()
    ev_single = data['quoteSummary']['result'][0]['defaultKeyStatistics']['enterpriseValue']['raw']
    ev_array.append(ev_single)  # makes array of enterprise values

threads = [threading.Thread(target=fetch_ev, args=(url,)) for url in
           url_array]  # calls multi thread that pulls enterprise value


for thread in threads:
    thread.start()
for thread in threads:
    thread.join()

pprint.pprint(dict(zip(ticker, ev_array)))

Sample output of the code:

1) {'aapl': '30.34B', 'googl': '484.66B', 'nvda': '602.66B'}

2) {'aapl': '484.66B', 'googl': '30.34B', 'nvda': '602.66B'}

I need the value to be matched up with the correct ticker.

Edit: I know dictionaries do not preserve order. Sorry, perhaps I was a little (very) unclear in my question. I have an array of ticker symbols, that matches the order of my url inputs. After running fetch_ev, I want to combine these ticker symbols with the matching enterprise value or ev_single. The order that they are stored in does not matter, however the pairings (k v pairs) or which values are stored with which ticker is very important.

Edit2 (MCVE) I changed the code to a simpler version of what I had- that shows the problem better. Sorry it's a little more complicated than I would have wanted complicated.

Upvotes: 2

Views: 94

Answers (1)

martineau
martineau

Reputation: 123501

To make it easy to maintain the correspondence between input and output, the ev_array can be preallocated so it's the same size as the ticker array, and the fetch_ev() thread function can be given an extra argument specifying the index of the location in that array to store the value fetched.

The maintain the integrity of the ev_array, a threading.RLock was added to prevent concurrent access to the shared resource which might otherwise be written to simultaneously by more than one thread. (Since its contents are now referenced directly through the index passed to fetch_ev(), this may not be strictly necessary.)

I don't know the proper ticker ↔ enterprise value concurrence to be able to verify the results that doing this produces:

{'aapl': 602658308096L, 'googl': 484659986432L, 'nvda': 30338199552L}

but at least they're now the same each time it's run.

import requests
import pprint
import threading

def fetch_ev(index, url):  # index parameter added
    response = requests.get(url)
    response.raise_for_status()
    data = response.json()
    ev_single = data['quoteSummary']['result'][0][
                     'defaultKeyStatistics']['enterpriseValue']['raw']
    with ev_array_lock:
        ev_array[index] = ev_single  # store enterprise value obtained

tickers = ['aapl', 'googl', 'nvda']
ev_array = [None] * len(tickers)  # preallocate to hold results
ev_array_lock = threading.RLock()  # to synchronize concurrent array access
urls = ['https://query2.finance.yahoo.com/v10/finance/quoteSummary/{}'
        '?formatted=true&crumb=8ldhetOu7RJ&lang=en-US&region=US'
        '&modules=defaultKeyStatistics%2CfinancialData%2CcalendarEvents'
        '&corsDomain=finance.yahoo.com'.format(symbol)
           for symbol in tickers]
threads = [threading.Thread(target=fetch_ev, args=(i, url))
                for i, url in enumerate(urls)]  # activities to obtain ev's

for thread in threads:
    thread.start()
for thread in threads:
    thread.join()
pprint.pprint(dict(zip(tickers, ev_array)))

Upvotes: 1

Related Questions