Reputation: 15
I have a thread_function(ticker) which basically takes in a stock symbol as a string, checks if it meets the condition and if it does, appends it to a list. capitulation(ticker, df) function returns either stock symbol or nothing at all. As I loop through 5000+ tickers and pull data for them, I have implemented a threading. Without threading implemented, this code takes at least half an hour to finish, but it actually works and I get data at the end in the results list. However, with threading it finished in less than a second, but the results list is empty at the end. For some reason when I put the breakpoints on capitulation() function it never stops, but it goes in the pull_data() function which basically downloads the data from Yahoo Finance. Below is the code:
tickers = pd.read_csv("./text_files/stock_list.csv")
def thread_function(ticker):
try:
df = pull_data(ticker)
if not df.empty:
if capitulation(ticker, df):
results.append(ticker)
except:
pass
with print_lock:
print(threading.current_thread().name, ticker)
def threader():
while True:
worker = q.get()
thread_function(worker)
q.task_done()
print_lock = threading.Lock()
q = Queue()
# how many threads are we going to allow
for x in range(10):
t = threading.Thread(target=threader)
t.daemon = True
t.start()
start = time.time()
for ticker in tickers.yahoo_symbol:
q.put(lambda: thread_function(ticker))
q.join()
print('Entire job took:', time.time()-start)`
EDIT:
I have also tried with multiprocessing Pool and apply_async function as per code below, but it still does not return a list that is returned by running normally:
def log_result(result):
if result is not None:
results.append(result)
pool = Pool(25)
start = time.time()
for x in tickers.yahoo_symbol:
pool.apply_async(thread_function, args=(x,), callback=log_result)
pool.close()
pool.join()
print(results)
print('Entire job took:', time.time() - start)
thread_function() is in this case moved to another file since multiprocessing throws AttributeError.
Upvotes: 0
Views: 376
Reputation: 3875
When using threading or multiprocessing your functions are going to have their own copy of these variables inside of the thread and will not update the variable in your main script. This is why that variable is empty.
You should look at the Multiprocessing library, specifically the Pool and apply_async functions. These tools allow you to return results from the other threads back to the main thread.
Upvotes: 1