jacanterbury
jacanterbury

Reputation: 1505

slow down spoken audio (not from mp3/wav) using python

I need to slow down short bursts of spoken audio, captured over a mic and then play it out in realtime in a python script. I can capture and playback audio fine without changing the speed using an input and an output stream using PyAudio but I can't work out how to slow it down.

I've seen this post which uses pydub does something similar for audio from a file but can't work out how to modify it for my purposes.

Just to stress the key point from the question title, "(not from mp3/wav or any other file type)" as I want to do this in realtime with short bursts, idealy <= ~0.1s so just want to work with data read in from a PyAudio stream.

Does anyone who has experience with pydub know if it might do what I need?

NB I realise that the output would lag further and further behind and that there might be buffering issues however I'm just doing this for short bursts of upto 30 seconds and only want to slow the speech down by ~10%.

Upvotes: 1

Views: 7157

Answers (4)

jacanterbury
jacanterbury

Reputation: 1505

So it turns out it was very very simple.

Once I looked into the pydub and pyaudio code bases i realised that by simply specifying a lower value for the 'rate' parameter on the output audio stream (speaker) compared with the input audio stream (mic) the stream.write() function would handle it for me.

I had been expecting that a physical manipulation of the raw data would be required to transform the data into a loarger buffer.

Here's a simple example:

import pyaudio

FORMAT = pyaudio.paInt16
CHANNELS = 1
FRAME_RATE = 44100
CHUNK = 1024*4

# simply modify the value for the 'rate' parameter to change the playback speed
# <1 === slow down;  >1 === speed up
FRAMERATE_OFFSET = 0.8

audio = pyaudio.PyAudio()

#output stream
stream_out = audio.open(format=FORMAT,
                        channels=CHANNELS,
                        rate= int(FRAME_RATE * FRAMERATE_OFFSET),
                        output=True)

# open input steam to start recording mic audio
stream_in = audio.open(format=FORMAT, 
                       channels=CHANNELS,
                       rate=FRAME_RATE, 
                       input=True)

for i in range(1):
    # modify the chunk multiplier below to captyre longer time durations
    data = stream_in.read(CHUNK*25)
    stream_out.write(data)

stream_out.stop_stream()
stream_out.close()
audio.terminate()

To make this operational I'll need to set up a shared memory data buffer and setup a subprocess to handle the output so that I don't miss anything significant from the input signal.

Upvotes: 3

prosti
prosti

Reputation: 46301

Here is what I did.

import wave
channels = 1
swidth = 2
multiplier = 0.2

spf = wave.open('flute-a4.wav', 'rb')
fr=spf.getframerate() # frame rate
signal = spf.readframes(-1)

wf = wave.open('ship.wav', 'wb')
wf.setnchannels(channels)
wf.setsampwidth(swidth)
wf.setframerate(fr*multiplier)
wf.writeframes(signal)
wf.close()

I used flute from this repo.

Upvotes: 2

Arthur Grigoryan
Arthur Grigoryan

Reputation: 467

This question has been answered here.

from pydub import AudioSegment
sound = AudioSegment.from_file(…)

def speed_change(sound, speed=1.0):
    # Manually override the frame_rate. This tells the computer how many
    # samples to play per second
    sound_with_altered_frame_rate = sound._spawn(sound.raw_data, overrides={
        "frame_rate": int(sound.frame_rate * speed)
    })

    # convert the sound with altered frame rate to a standard frame rate
    # so that regular playback programs will work right. They often only
    # know how to play audio at standard frame rate (like 44.1k)
    return sound_with_altered_frame_rate.set_frame_rate(sound.frame_rate)

slow_sound = speed_change(sound, 0.75)
fast_sound = speed_change(sound, 2.0)

Upvotes: 0

Anil_M
Anil_M

Reputation: 11443

As mentioned in the comments, by simply increasing or decreasing sampling frequency / frame rate , you can speed-up of slowdown audio. Although if you are planning to do it from microphone in realtime, one of the idea will be to record in chunks of few seconds, play the slowed down audio and then move onto recording again.

Here's an example using sounddevice , which is basically slight mod of my answer here. We record audio for 4 seconds in loop for 3 times, and play back immediatly with frame rate offset ( > 1 to speedup and < 1 for slowdown). Added time delay of 1 sec for audio playback to complete before we start new chunk.

import sounddevice as sd
import numpy as np
import scipy.io.wavfile as wav
import time

fs=44100
duration = 4  # seconds
#fs_offset = 1.3 #speedup
fs_offset = 0.8 #speedup slow down

for count in range(1,4):
    myrecording = sd.rec(duration * fs, samplerate=fs, channels=2, dtype='float64')
    print "Recording Audio chunk {} for {} seconds".format(count, duration)
    sd.wait()
    print "Recording complete, Playing chunk {} with offset {} ".format(count, fs_offset)
    sd.play(myrecording, fs * fs_offset)
    sd.wait()
    print "Playing chunk {} Complete".format(count)
    time.sleep(1)

Output:

$python sdaudio.py
Recording Audio chunk 1 for 4 seconds
Recording complete, Playing chunk 1 with offset 0.8 
Playing chunk 1 Complete
Recording Audio chunk 2 for 4 seconds
Recording complete, Playing chunk 2 with offset 0.8 
Playing chunk 2 Complete
Recording Audio chunk 3 for 4 seconds
Recording complete, Playing chunk 3 with offset 0.8 
Playing chunk 3 Complete

Here's an example using PyAudio for recording from microphone and pydub for playback. Although you can also use pyaudio blocking wire capability to modify outgoing audio. I used pydub since you referrred to a pydub based solution. This is a mod of code from here.

import pyaudio
import wave
from pydub import AudioSegment
from pydub.playback import play
import time

FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 44100
CHUNK = 1024
RECORD_SECONDS = 4
#FRAMERATE_OFFSET = 1.4  #speedup
FRAMERATE_OFFSET = 0.7 #slowdown
WAVE_OUTPUT_FILENAME = "file.wav"

def get_audio():

    audio = pyaudio.PyAudio()

    # start Recording
    stream = audio.open(format=FORMAT, channels=CHANNELS,
                    rate=RATE, input=True,
                    frames_per_buffer=CHUNK)
    frames = []

    for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
        data = stream.read(CHUNK)
        frames.append(data)

    # stop Recording
    stream.stop_stream()
    stream.close()
    audio.terminate()

    #save to file
    waveFile = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
    waveFile.setnchannels(CHANNELS)
    waveFile.setsampwidth(audio.get_sample_size(FORMAT))
    waveFile.setframerate(RATE * FRAMERATE_OFFSET)
    waveFile.writeframes(b''.join(frames))
    waveFile.close()


for count in range(1,4):
    print "recording segment {} ....".format(count)
    frame_array = get_audio()
    print "Playing segment {} .... at offset {}".format(count, FRAMERATE_OFFSET)
    audio_chunk = AudioSegment.from_wav(WAVE_OUTPUT_FILENAME)
    print "Finished playing segment {} .... at offset {}".format(count, FRAMERATE_OFFSET)
    play(audio_chunk)
    time.sleep(1)

Output:

$python slowAudio.py 
recording segment 1 ....
Playing segment 1 .... at offset 0.7
Finished playing segment 1 .... at offset 0.7
recording segment 2 ....
Playing segment 2 .... at offset 0.7
Finished playing segment 2 .... at offset 0.7
recording segment 3 ....
Playing segment 3 .... at offset 0.7

Upvotes: 0

Related Questions