How to convert any Audio file to np.ndarray for OpenAI Whisper

Question

I'm trying to make a backend where I can upload an audio file and use the whisper ai to transcribe it but transcribe accepts type np.ndarray and the audio files are in bytes I'm not sure how can I convert bytes -> ndarray.

I'm using postman to send an audio file to this backend, but I will need to convert bytes to ndarray to use the transcribe method of whisper ai but I'm not sure how I can do it.

import numpy as np
import whisper
from typing import Annotated
from fastapi import FastAPI, File

app = FastAPI()

@app.post("/abcd")
async def transcribe_audio(audio_file_upload: Annotated[bytes, File()]):
    model = whisper.load_model("base")
    result = model.transcribe(audio_file_upload, word_timestamps=True, fp16=True)
    return {"transcription": result}

Error

TypeError: expected np.ndarray (got bytes)

I tried using

audio_data = np.frombuffer(audio_file_upload, dtype=np.float32)

but got

ValueError: Expected parameter logits (Tensor of shape (1, 51865)) of distribution Categorical(logits: torch.Size([1, 51865])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
tensor([[nan, nan, nan,  ..., nan, nan, nan]])

but I'm begginer using numpy etc and so I'm not sure how I can implement it?

How to convert any Audio file to np.ndarray for OpenAI Whisper

Answers (1)

Related Questions