geb12
geb12

Reputation: 116

mechanize + async browser call

I´m searching for a solution to make lots of async web requests without waiting for a answer.

Here is my current code:

import mechanize
from mechanize._opener import urlopen
from mechanize._form import ParseResponse
from multiprocessing import Pool

brow = mechanize.Browser()
brow.open('https://website.com')

#Login
brow.select_form(nr = 0)

brow.form['username'] = 'user'
brow.form['password'] = 'password'
brow.submit()

while(true):
    #async open the browser until some state is fullfilled
    brow.open('https://website.com/needthiswebsite')

The problem with the code above is that if I try to make two browser openings bro2 has to wait for bro1 to finish to start. (its blocking)

bro1.open('https://website.com/needthiswebsite')
bro2.open('https://website.com/needthiswebsite')

Attempt of a solution:

#PSUDO-CODE

#GLOBAL VARIABLE STATE
boolean state = true

while(state):
    #async open the browser until some state is full filled
    #I spam this function until I get a positive answer from one of the calls
    pool = Pool(processes = 1)
    result = pool.apply_async(openWebsite,[brow1],callback = updateState)

def openWebsite(browser):
   result = browser.open('https://website.com/needthiswebsite')
   if result.something() == WHATIWANT:
        return true
   return false

def updateState(state):
    state = true

I was trying to implement a similar solution for my problem like the answer in: Asynchronous method call in Python? question on stackoverflow.

The problem with this is I get a error when trying to use pool.apply_async(brow.open())

ERROR MSG:

PicklingError: Can't pickle : attribute lookup builtin.function failed

I have tried lots of things to try to fix the PicklingError but nothing seems to work.

Any help would be really appreciated:)

Upvotes: 2

Views: 752

Answers (1)

dano
dano

Reputation: 94941

The mechanize.Browser object is not pickleable, so it can't be passed to pool.apply_async (or any other method that needs to send the object to a child process):

>>> b = mechanize.Browser()
>>> import pickle
>>> pickle.dumps(b)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/pickle.py", line 1374, in dumps
    Pickler(file, protocol).dump(obj)
  File "/usr/lib/python2.7/pickle.py", line 224, in dump
    self.save(obj)
  File "/usr/lib/python2.7/pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python2.7/pickle.py", line 725, in save_inst
    save(stuff)
  File "/usr/lib/python2.7/pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python2.7/pickle.py", line 649, in save_dict
    self._batch_setitems(obj.iteritems())
  File "/usr/lib/python2.7/pickle.py", line 663, in _batch_setitems
    save(v)
  File "/usr/lib/python2.7/pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python2.7/pickle.py", line 600, in save_list
    self._batch_appends(iter(obj))
  File "/usr/lib/python2.7/pickle.py", line 615, in _batch_appends
    save(x)
  File "/usr/lib/python2.7/pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python2.7/pickle.py", line 725, in save_inst
    save(stuff)
  File "/usr/lib/python2.7/pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python2.7/pickle.py", line 649, in save_dict
    self._batch_setitems(obj.iteritems())
  File "/usr/lib/python2.7/pickle.py", line 663, in _batch_setitems
    save(v)
  File "/usr/lib/python2.7/pickle.py", line 306, in save
    rv = reduce(self.proto)
  File "/usr/lib/python2.7/copy_reg.py", line 70, in _reduce_ex
    raise TypeError, "can't pickle %s objects" % base.__name__
TypeError: can't pickle instancemethod objects

The easiest thing to do is create the Browser instance inside each sub-process, rather than in the parent:

def openWebsite(url):
    brow = mechanize.Browser()
    brow.open('https://website.com')

    #Login
    brow.select_form(nr=0)

    brow.form['username'] = 'user'
    brow.form['password'] = 'password'
    brow.submit()

    result = brow.open(url)
    if result.something() == WHATIWANT:
         return True
    return False

Ideally you'd be able to just log-in with the Browser object in the parent process, and then make parallel requests across the many processes, but it may take a significant amount of effort to make the object pickleable (if it's possible at all) - even if you manage to remove the instancemethod object that's causing the current error, there could be many more unpickleable objects inside the Browser in addition to that.

Upvotes: 1

Related Questions