Reputation: 53606
The need is to upload a file to a FastAPI endpoint, convert it to Markdown and save the text to Redis (Files are up to 4MB in size).
The only logic I have found so far is to upload the file as UploadFile
, read the contents, save them to disk with the right extension, pass that path to MarkItDown
library, read that markdown file again, and then pass it to Redis. Way too much I/O. Is there a way to do all of this in memory?
(For the sake of code simplicity, I removed all error handling and I assume only text files)
@router.post("/upload")
async def uploadPost(filepond: UploadFile = File()):
"""
Convert a textual file to markdown.
Store in Redis
"""
# Create a temporary file to save the uploaded content
# for sake of simplicity I use txt for everything
with NamedTemporaryFile(delete=False,suffix=".txt") as temp_file:
temp_file_path = temp_file.name
content = await filepond.read()
temp_file.write(content)
temp_file.close()
md = MarkItDown()
result = md.convert(temp_file_path)
redis.setex("some key", 3600, result.text_content)
os.remove(temp_file_path)
Upvotes: 1
Views: 123
Reputation: 1
from fastapi import APIRouter, UploadFile, File
import aioredis
from markitdown import MarkItDown
router = APIRouter()
# Initialize Redis client
redis = aioredis.from_url("redis://localhost", decode_responses=True)
@router.post("/upload")
async def upload_post(file: UploadFile = File(...)):
"""
Converts an uploaded text file to Markdown and stores the result in Redis.
"""
content = await file.read() # Read file content into memory
text = content.decode("utf-8") # Decode bytes to string
md = MarkItDown()
result = md.convert(text) # Convert text to Markdown
await redis.setex("markdown_content", 3600, result.text_content) # Store in Redis with expiration
return {"message": "File successfully processed and stored in Redis"}
Upvotes: -2
Reputation: 34551
It seems that you are limited by the library you are currently using, not FastAPI, which offers a way to get the request body in chunks as they arrive (using request.stream()
instead of UploadFile
)—see this answer and this answer.
The library you are using includes a convert_stream()
method, but it doesn't seem to do what the name actually implies. The stream
parameter (which doesn't have a definite type) is used to read the entire contents at once and simply store them to a temporary file (essentially, similar to your current approach).
Given the limitations of the library, you might still benefit from using request.stream()
(even though with files of up to 4MB in size, as mentioned that you are using, might not be that noticable) to write the chunks as they arrive to a NamedTemporaryFile
directly, compared to using UploadFile
, which would store files larger than 1MB to a SpooledTemporaryFile
that you later need to read the contents from, as explained in this answer. Hence, you would at least avoid writing and reading from two temporary files, unecessarily, as shown in the example provided in your question. Similar examples could be found here, as well as here and here.
from fastapi import FastAPI, Request, HTTPException
from fastapi.concurrency import run_in_threadpool
from tempfile import NamedTemporaryFile
import aiofiles
import os
app = FastAPI()
@app.post('/upload')
async def upload(request: Request):
try:
async with aiofiles.tempfile.NamedTemporaryFile("wb", delete=False, suffix=".txt") as temp:
try:
async for chunk in request.stream():
await temp.write(chunk)
except Exception:
raise HTTPException(status_code=500, detail='Something went wrong')
# You could have the `convert` function run in an external ThreadPool/ProcessPool,
# in order to avoid blocking the event loop
md = MarkItDown()
res = await run_in_threadpool(md.convert, temp.name)
except Exception:
raise HTTPException(status_code=500, detail='Something went wrong')
finally:
os.remove(temp.name)
Upvotes: 1
Reputation: 45
What do you think about using tmpfs to store a temporary file in memory?
I think libraries like "memory-tempfile" are worth considering.
Upvotes: 0