dougj
dougj

Reputation: 135

FastAPI handle request time out

I have an API endpoint in which I am trying to processing the csv file and passing it to the scraper function after scraping is done I am downloading the scraped result as a csv file generated by scraper function, all functioning as I expected but when I deploy it on heroku after uploading the csv file after few seconds it occurs 503 response which means request timed out. So, I want to ask how can I handle the request time out error properly in a sense it won't crash and run till the scraper is running and file has been downloaded.

import fastapi as _fastapi
from fastapi.responses import HTMLResponse, FileResponse
import shutil
import os

from scraper import run_scraper


app = _fastapi.FastAPI()


@app.get("/")
def index():
    content = """
<body>
<form method="post" action="/api/v1/scraped_csv" enctype="multipart/form-data">
<input name="csv_file" type="file" multiple>
<input type="submit">
</form>
</body>
    """
    return HTMLResponse(content=content)


@app.post("/api/v1/scraped_csv")
async def extract_ads(csv_file: _fastapi.UploadFile = _fastapi.File(...)):
    temp_file = _save_file_to_disk(csv_file, path="temp", save_as="temp")
    await run_scraper(temp_file)
    csv_path = os.path.abspath(clean_file)
    return FileResponse(path=csv_path, media_type="text/csv", filename=clean_file)


def _save_file_to_disk(uploaded_file, path=".", save_as="default"):
    extension = os.path.splitext(uploaded_file.filename)[-1]
    temp_file = os.path.join(path, save_as + extension)
    with open(temp_file, "wb") as buffer:
        shutil.copyfileobj(uploaded_file.file, buffer)
    return temp_file

Here is the link for the app.

https://fast-api-google-ads-scraper.herokuapp.com/

Upvotes: 1

Views: 8020

Answers (2)

Shivam Miglani
Shivam Miglani

Reputation: 572

One possible solution would be what @fchancel is suggesting. Run scraping as a background task via a Redis Queue and inform the user that a job with job_id (key in redis) has been created. The worker dynos can store the results of the background job in a blob storage. You can fetch your results from the blob storage using job_id.

For knowing the status of the job, kindly have a look at this question

Upvotes: 1

fchancel
fchancel

Reputation: 2699

I currently see two possibilities.

1st, you increase the waiting time before the timeout. If you use Gunicorn you can use -t INT or --timeout INT knowing that Value is a positive number or 0. Setting it to 0 has the effect of infinite timeouts by disabling timeouts for all workers entirely.

Second one, you use an asynchronous request/response. You respond immediately with a 202 and tell the client where he can track the status of the task but this requires creating a new endpoint, new logic, etc...

Upvotes: 1

Related Questions