Reputation: 63
For a web-scraping analysis I need two loops that run permanently, one returning a list with websites updated every x minutes, while the other one analyses the sites (old an new ones) every y seconds. This is the code construction that exemplifies, what I am trying to do, but it doesn't work: Code has been edited to incorporate answers and my research
from multiprocessing import Process
import time, random
from threading import Lock
from collections import deque
class MyQueue(object):
def __init__(self):
self.items = deque()
self.lock = Lock()
def put(self, item):
with self.lock:
self.items.append(item)
# Example pointed at in [this][1] answer
def get(self):
with self.lock:
return self.items.popleft()
def a(queue):
while True:
x=[random.randint(0,10), random.randint(0,10), random.randint(0,10)]
print 'send', x
queue.put(x)
time.sleep(10)
def b(queue):
try:
while queue:
x = queue.get()
print 'recieve', x
for i in x:
print i
time.sleep(2)
except IndexError:
print queue.get()
if __name__ == '__main__':
q = MyQueue()
p1 = Process(target=a, args=(q,))
p2 = Process(target=b, args=(q,))
p1.start()
p2.start()
p1.join()
p2.join()
So, this is my first Python project after an online introduction course and I am struggling here big time. I understand now, that the functions don't truly run in parallel, as b does not start until a is finished ( I used this answer an tinkered with the timer and while True). EDIT: Even after using the approach given in the answer, I think this is still the case, as the queue.get()
throws an IndexError saying, the deque is empty. I can only explain that with process a not finishing, because when I print queue.get()
immediately after .put(x) it is not empty.
I eventually want an output like this:
send [3,4,6]
3
4
6
3
4
send [3,8,6,5] #the code above gives always 3 entries, but in my project
3 #the length varies
8
6
5
3
8
6
.
.
What do I need for having two truly parallel loops where one is returning an updated list every x minutes which the other loop needs as basis for analysis? Is Process really the right tool here? And where can I get good info about designing my program.
Upvotes: 1
Views: 1569
Reputation: 577
I did something a little like this a while ago. I think using the Process is the correct approach, but if you want to pass data between processes then you should probably use a Queue.
https://docs.python.org/2/library/multiprocessing.html#exchanging-objects-between-processes
Create the queue first and pass it into both processes. One can write to it, the other can read from it.
One issue I remember is that the reading process will block on the queue until something is pushed to it, so you may need to push a special 'terminate' message of some kind to the queue when process 1 is done so process 2 knows to stop.
EDIT: Simple example. This doesn't include a clean way to stop the processes. But it shows how you can start 2 new processes and pass data from one to the other. Since the queue blocks on get() function b will automatically wait for data from a before continuing.
from multiprocessing import Process, Queue
import time, random
def a(queue):
while True:
x=[random.randint(0,10), random.randint(0,10), random.randint(0,10)]
print 'send', x
queue.put(x)
time.sleep(5)
def b(queue):
x = []
while True:
time.sleep(1)
try:
x = queue.get(False)
print 'receive', x
except:
pass
for i in x:
print i
if __name__ == '__main__':
q = Queue()
p1 = Process(target=a, args=(q,))
p2 = Process(target=b, args=(q,))
p1.start()
p2.start()
p1.join()
p2.join()
Upvotes: 2