How to interpret ndarray in a pyAV AudioFrame?

Question

I want to process streaming audio (coming in from a person speaking on the peer of a webRTC peer connection) to detect when the person is done talking. I have got the audio track and access to individual frames. I see that each frame can be converted to an nd_array using Frame.to_ndarray. I can also see values in the ndarray changing depending on what the person is speaking, what pitch, what volume etc. Now, I want to detect silence on the stream. My question is what is in the ndarray and how can I make sense of the data?

        while True:
            try:
                frame:AudioFrame = await track.recv()
                frame_nd_array = frame.to_ndarray()

Where can I learn what is in the frame_nd_array?

How to interpret ndarray in a pyAV AudioFrame?

Answers (0)

Related Questions