Reputation: 233
Originally, with the code I was using, Pool.map was sufficient for threading my code as there was only one argument (an, iterable) being passed in as a parameter to my function. Now, I have a requirement to pass in multiple arguments to the function and I'm having some trouble using Pool.starmap.
I attempted to use zip alongside Pool.map to no avail.
Here's my current code:
def get_links_on_page(job_title, page_num):
page = requests.get("%s/jobs?q=%s&l=%s%%2C%s&start=%s" % (__SITE_BASE__, job_title.replace(' ', '+'), 'City', 'PROV', str(page_num*25)), verify=False)
print(page.url)
soup = BeautifulSoup(page.content, 'html.parser')
return [link.a.get('href') for link in soup.find_all('div', {'class': 'title'})]
def get_all_links(job_title):
"""
:param: job_title (string): A string representing the job's title
"""
all_links = []
pool = ThreadPool(processes=20)
all_links.extend(pool.starmap(get_links_on_page, (job_title, [i for i in range(1, 5)])))
pool.close()
return all_links
This gives me an error like:
TypeError: '<=' not supported between instances of 'list' and 'int'
I also attempted to pass in the two arguments as an iterable like so:
def get_all_links(job_title):
all_links = []
pool = ThreadPool(processes=20)
all_links.extend(pool.starmap(get_links_on_page, [job_title, [i for i in range(1, 5)]])) #[func(job_title, 1), func(job_title, 2), func(job_title, 3) ...]
pool.close()
return all_links
And that would equate to 18 arguments and thus throw an error. I'm currently reading the docs here:
https://docs.python.org/dev/library/multiprocessing.html#multiprocessing.pool.Pool.starmap
But I'm having trouble getting the syntax down..
Any help would be really appreciated!
Upvotes: 3
Views: 9388
Reputation: 21664
You've been on the right track with using zip()
, you just need to repeat()
the job_title:
list(zip(itertools.repeat("jobname"), range(1, 5)))
# [('jobname', 1), ('jobname', 2), ('jobname', 3), ('jobname', 4)]
So for your example:
from itertools import repeat
def get_all_links(job_title, n): # n would be 4 in your example
iterable = zip(repeat(job_title), range(1, n+1))
with ThreadPool(n) as pool:
all_links = pool.starmap(get_links_on_page, iterable)
return all_links
Upvotes: 4