John Terry
John Terry

Reputation:

editing a wav files using python

Between each word in the wav file I have full silence (I checked with Hex workshop and silence is represented with 0's).

How can I cut the non-silence sound?

I'm programming using python.

Thanks!

Upvotes: 5

Views: 28143

Answers (6)

ffhaddad
ffhaddad

Reputation: 1723

I have been doing some research on this topic for a project I'm working on and I came across a few problems with the solution provided, namely the method for determining silence is incorrect. A "more correct" implementation would be:

import struct
import wave

wave_file = wave.open("sound_file.wav", "r")

for i in range(wave_file.getnframes()):
    # read a single frame and advance to next frame
    current_frame = wave_file.readframes(1)

    # check for silence
    silent = True
    # wave frame samples are stored in little endian**
    # this example works for a single channel 16-bit per sample encoding
    unpacked_signed_value = struct.unpack("<h", current_frame) # *
    if abs(unpacked_signed_value[0]) > 500:
        silent = False

    if silent:
        print "Frame %s is silent." % wave_file.tell()
    else
        print "Frame %s is not silent." % wave_file.tell()

References and Useful Links

*Struct Unpacking will be useful here: https://docs.python.org/2/library/struct.html

**A good reference I found explaining the format of wave files for dealing with different size bit-encodings and multiple channels is: http://www.piclist.com/techref/io/serial/midi/wave.html

Using the built-in ord() function in Python on the first element of the string object returned by the readframes(x) method will not work correctly.

Another key point is that multiple channel audio is interleaved and thus a little extra logic is needed for dealing with channels. Again, the link above goes into detail about this.

Hopefully this helps someone in the future.

Here are some of the more important points from the link, and what I found helpful.

Data Organization


All data is stored in 8-bit bytes, arranged in Intel 80x86 (ie, little endian) format. The bytes of multiple-byte values are stored with the low-order (ie, least significant) bytes first. Data bits are as follows (ie, shown with bit numbers on top):

         7  6  5  4  3  2  1  0
       +-----------------------+
 char: | lsb               msb |
       +-----------------------+

         7  6  5  4  3  2  1  0 15 14 13 12 11 10  9  8
       +-----------------------+-----------------------+
short: | lsb     byte 0        |       byte 1      msb |
       +-----------------------+-----------------------+

         7  6  5  4  3  2  1  0 15 14 13 12 11 10  9  8 23 22 21 20 19 18 17 16 31 30 29 28 27 26 25 24
       +-----------------------+-----------------------+-----------------------+-----------------------+
 long: | lsb     byte 0        |       byte 1          |         byte 2        |       byte 3      msb |
       +-----------------------+-----------------------+-----------------------+-----------------------+

Interleaving


For multichannel sounds (for example, a stereo waveform), single sample points from each channel are interleaved. For example, assume a stereo (ie, 2 channel) waveform. Instead of storing all of the sample points for the left channel first, and then storing all of the sample points for the right channel next, you "mix" the two channels' sample points together. You would store the first sample point of the left channel. Next, you would store the first sample point of the right channel. Next, you would store the second sample point of the left channel. Next, you would store the second sample point of the right channel, and so on, alternating between storing the next sample point of each channel. This is what is meant by interleaved data; you store the next sample point of each of the channels in turn, so that the sample points that are meant to be "played" (ie, sent to a DAC) simultaneously are stored contiguously.

Upvotes: 7

Soviut
Soviut

Reputation: 91635

Python has a wav module. You can use it to open a wav file for reading and use the `getframes(1)' command to walk through the file frame by frame.

import wave
w = wave.open('beeps.wav', 'r')
for i in range():
frame = w.readframes(1)

The frame returned will be a byte string with hex values in it. If the file is stereo the result will look something like this (4 bytes):

'\xe2\xff\xe2\xff'

If its mono, it will have half the data (2 bytes):

'\xe2\xff'

Each channel is 2 bytes long because the audio is 16 bit. If is 8 bit, each channel will only be one byte. You can use the getsampwidth() method to determine this. Also, getchannels() will determine if its mono or stereo.

You can loop over these bytes to see if they all equal zero, meaning both channels are silent. In the following example I use the ord() function to convert the '\xe2' hex values to integers.

import wave
w = wave.open('beeps.wav', 'r')
for i in range(w.getnframes()):
    ### read 1 frame and the position will updated ###
    frame = w.readframes(1)

    all_zero = True
    for j in range(len(frame)):
        # check if amplitude is greater than 0
        if ord(frame[j]) > 0:
            all_zero = False
            break

    if all_zero:
        # perform your cut here
        print 'silence found at frame %s' % w.tell()
        print 'silence found at second %s' % (w.tell()/w..getframerate())

It is worth noting that a single frame of silence doesn't necessarily denote empty space since the amplitude may cross the 0 mark normal frequencies. Therefore, it is recommended that a certain number of frames at 0 be observed before deciding if the region is, in fact, silent.

Upvotes: 21

Kylotan
Kylotan

Reputation: 18449

You will need to come up with some threshold value of a minimum number of consecutive zeros before you cut them. Otherwise you'll be removing perfectly valid zeros from the middle of normal audio data. You can iterate through the wave file, copying any non-zero values, and buffering up zero values. When you're buffering zeroes and eventually come across the next non-zero, if the buffer has fewer samples that the threshold, copy them over, otherwise discard it.

Python is not a great tool for this sort of task though. :(

Upvotes: 1

Paweł Polewicz
Paweł Polewicz

Reputation: 3852

You might want to try using sox, a command-line sound processing tool. It has many modes, one of them is silence:

silence: Removes silence from the beginning, middle, or end of a sound file. Silence is anything below a specified threshold.

It supports multiple sound formats and it's quite fast, so parsing large files shouldn't be a problem.

To remove silence from the middle of a file, specify a below_periods that is negative. This value is then treated as a positive value and is also used to indicate the effect should restart processing as specified by the above_periods, making it suitable for removing periods of silence in the middle of the sound file.

I haven't found any python building for libsox, though, but You can use it as You use all command line programs in python (or You can rewrite it - use sox sources for guidance then).

Upvotes: 1

Stephan202
Stephan202

Reputation: 61589

I have no experience with this, but have a look at the wave module present in the standard library. That may do what you want. Otherwise you'll have to read the file as a byte stream an cut out sequences of 0-bytes (but you cannot just cut out all 0-bytes, as that would invalidate the file...)

Upvotes: 1

Related Questions