Reputation: 142
I am trying to create a flask application with rq redis which stores tasks that returns the data scraped by playwright library.
What I am trying to do is to create a global browser instance of playwright variable, so when different users try to get that data from flask, same instance is used. But I encounter a problem while sending the browser instance as an argument to task function through rq.enqueue
using monkey.patch_all() from gevent doesn't seem to work
My code is as follows: app.py
from gevent import monkey
monkey.patch_all()
import redis
from rq import Queue
from playwright.sync_api import sync_playwright
from flask import (
Flask,
render_template,
request,
make_response,
)
from src.flask.utils import (
return_map_info,
get_cookie
)
playwright = sync_playwright().start()
browser = playwright.chromium.launch(headless=True)
r = redis.Redis(host='localhost')
q = Queue(connection=r)
app = Flask(__name__)
@app.route('/add', methods=('GET', 'POST'))
def add_task():
"""
This function is used to get the data from the post form
and then add the task into the redis queue, which data will be
returned later after the task is complete
"""
jobs = q.jobs
message = None
if request.method == "POST":
url = request.form['url']
search_type = request.form['search_type']
task = q.enqueue(return_map_info,
args=(browser,),
kwargs={
'url':url,
'type':search_type
})
job_id = task.id
cookie_key = get_cookie(request.cookies.get('cookieid'))
jobs = q.jobs
q_length = len(q)
r.hset(cookie_key, url, job_id)
message = f"The result is {task} and the jobs queued are {q_length}"
resp = make_response(render_template(
"add.html", message=message, jobs=jobs))
resp.set_cookie("cookieid", cookie_key)
return resp
return render_template(
"add.html", message=message, jobs=jobs)
Upvotes: 0
Views: 85
Reputation: 637
From what I understand, you want to show the scraped results for each and every user. You don't need a queue if you just want to do that, instead find an appropriate way to store the scraped data, filter based on the user input and send the data in the response.
But if you are running RQ worker to scrape sites based on the user input, you have to initiate playwright instance inside the worker, run the job which scrapes the data and store them in a database, which could be later used as mentioned above.
RQ is a task queue based off Redis. Separate RQ worker process is run listening on Redis configured queues. So whatever input(args, kwargs) you give to the job must be serialisable in order to be stored in Redis. And the worker listening reads and deserialises data to get the actual input.
Upvotes: 1