Reputation: 135
I have an API endpoint in which I am trying to processing the csv file and passing it to the scraper function after scraping is done I am downloading the scraped result as a csv file generated by scraper function, all functioning as I expected but when I deploy it on heroku after uploading the csv file after few seconds it occurs 503 response which means request timed out. So, I want to ask how can I handle the request time out error properly in a sense it won't crash and run till the scraper is running and file has been downloaded.
import fastapi as _fastapi
from fastapi.responses import HTMLResponse, FileResponse
import shutil
import os
from scraper import run_scraper
app = _fastapi.FastAPI()
@app.get("/")
def index():
content = """
<body>
<form method="post" action="/api/v1/scraped_csv" enctype="multipart/form-data">
<input name="csv_file" type="file" multiple>
<input type="submit">
</form>
</body>
"""
return HTMLResponse(content=content)
@app.post("/api/v1/scraped_csv")
async def extract_ads(csv_file: _fastapi.UploadFile = _fastapi.File(...)):
temp_file = _save_file_to_disk(csv_file, path="temp", save_as="temp")
await run_scraper(temp_file)
csv_path = os.path.abspath(clean_file)
return FileResponse(path=csv_path, media_type="text/csv", filename=clean_file)
def _save_file_to_disk(uploaded_file, path=".", save_as="default"):
extension = os.path.splitext(uploaded_file.filename)[-1]
temp_file = os.path.join(path, save_as + extension)
with open(temp_file, "wb") as buffer:
shutil.copyfileobj(uploaded_file.file, buffer)
return temp_file
Here is the link for the app.
https://fast-api-google-ads-scraper.herokuapp.com/
Upvotes: 1
Views: 8020
Reputation: 572
One possible solution would be what @fchancel is suggesting. Run scraping as a background task via a Redis Queue and inform the user that a job with job_id
(key in redis) has been created. The worker dynos can store the results of the background job in a blob storage. You can fetch your results from the blob storage using job_id
.
For knowing the status of the job, kindly have a look at this question
Upvotes: 1
Reputation: 2699
I currently see two possibilities.
1st, you increase the waiting time before the timeout.
If you use Gunicorn you can use -t INT
or --timeout INT
knowing that Value is a positive number or 0. Setting it to 0 has the effect of infinite timeouts by disabling timeouts for all workers entirely.
Second one, you use an asynchronous request/response. You respond immediately with a 202 and tell the client where he can track the status of the task but this requires creating a new endpoint, new logic, etc...
Upvotes: 1