Reputation: 31
I am trying to write a crawler for a web security project, and I'm having strange behaviour with a method using multiprocessing.
What should this method do? It iterates over found target web pages, with a list of found query parameters. For each web page, it should apply the method phase1 (my attack logic) to every query parameter associated with that page.
Meaning, if I have http://example.com/sub.php, having page &secret as query parameters, and http://example.com/s2.php, having topsecret as parameter, it should do the following:
I know if an attack is happening, based on time and output of phase1.
What actually happens
Only the first attack is executed. The following calls to apply_async are ignored. However, it still cycles through the loop, since it still prints the output from above for loop.
What is going wrong here? Why is the attack routine not triggered? I have looked up the docs for multiprocessing, but it doesn't help explaining this phenomenon.
Some answers in related problems suggested using terminate and join, but insn't this done implicitely here, since I'm using the with statement?
Also, this question (Multiprocessing pool 'apply_async' only seems to call function once) sounds very similar, but is different from my problem. In contrary to that question, I don't have the problem that only 1 worker executes the code, but that my X workers are only spawned once (instead of Y times).
What I've tried: putting with..Pool outside of loops, but nothing changed
The method in question is the following:
def analyzeParam(siteparams, paysplit, victim2, verbose, depth, file, authcookie):
result = {}
subdir = parseUrl(viclist[0])
for victim, paramlist in siteparams.items():
sub = {}
print("\n{0}[INFO]{1} param{4}|{2} Attacking {3}".format(color.RD, color.END + color.O, color.END, victim, color.END+color.RD))
time.sleep(1.5)
for param in paramlist:
payloads = []
nullbytes = []
print("\n{0}[INFO]{1} param{4}|{2} Using {3}\n".format(color.RD, color.END + color.O, color.END, param, color.END+color.RD))
time.sleep(1.5)
with Pool(processes=processes) as pool:
res = [pool.apply_async(phase1, args=(1,victim,victim2,param,None,"",verbose,depth,l,file,authcookie,"",)) for l in paysplit]
for i in res:
#fetch results
tuples = i.get()
payloads += tuples[0]
nullbytes += tuples[1]
sub[param] = (payloads, nullbytes)
time.sleep(3)
result[victim] = sub
if not os.path.exists(cachedir+subdir):
os.makedirs(cachedir+subdir)
with open(cachedir+subdir+"spider-phase2.json", "w+") as f:
json.dump(result, f, sort_keys=True, indent=4)
return result
Some technical information:
How do I fix this? Thanks!
Upvotes: 1
Views: 269
Reputation: 31
Big kudos to jasonharper for finding the issue! The issue was not the code structure above, but the variable paysplit, which was a generator and went exhausted after the first call.
Again, thank you for pointing out!
Bests
Upvotes: 1