How do I use the NumPy array returned from sounddevice Stream class to perform additional processing?

Question

I am using a python library sounddevice for some audio processing. When I use the Stream class to pass the input data collected from input device (mic) through to the output, the callback function has a NumPy array that represents the sound data:

def callback(indata, outdata, frames, time, status):
    outdata[:] = indata

indata is a NumPy array that contains arrays of floats. What do these floats represent? How can I perform time stretching or pitch shifting on this data?

Matthias · Accepted Answer

First of all, a warning: If you want efficient and reliable realtime audio processing, Python is probably not a very good choice (because it's an interpreted language, it uses garbage collection and of course because of the infamous GIL).

If you want to use Python anyway, there are a few libraries for realtime audio signal processing; pyo and LibROSA come to mind, more can be found at the PythonInMusic wiki page.

Now to answer your actual question: The floating point values (by default float32) in indata are amplitude values that represent sound pressure. You can also think of it as voltage, if that helps.

Note that the values coming from the sound card are limited to a range from -1.0 to +1.0. When you write your output signal(s) to outdata, you have to take care that they are also limited to this range, otherwise you will hear ugly distortions.

The indata and outdata arrays are 2-dimensional, where the columns represent the audio channels. Each row represents one time instance.

You might also want to read my little page about creating simple audio signals with Python.

The actual algorithms for time stretching/pitch shifting are off-topic here, if you need help you can ask at https://dsp.stackexchange.com/.

How do I use the NumPy array returned from sounddevice Stream class to perform additional processing?

Answers (1)

Related Questions