Reputation: 2550
I'm trying to unpack the data from a single channel WAVE file using struct.unpack
. I want to store the data in an array and be able to manipulate it (say by adding noise of given variance). I have extracted the header data and stored it in a dictionary as follows:
stHeaderFields['ChunkSize'] = struct.unpack('<L', bufHeader[4:8])[0]
stHeaderFields['Format'] = bufHeader[8:12]
stHeaderFields['Subchunk1Size'] = struct.unpack('<L', bufHeader[16:20])[0]
stHeaderFields['AudioFormat'] = struct.unpack('<H', bufHeader[20:22])[0]
stHeaderFields['NumChannels'] = struct.unpack('<H', bufHeader[22:24])[0]
stHeaderFields['SampleRate'] = struct.unpack('<L', bufHeader[24:28])[0]
stHeaderFields['ByteRate'] = struct.unpack('<L', bufHeader[28:32])[0]
stHeaderFields['BlockAlign'] = struct.unpack('<H', bufHeader[32:34])[0]
stHeaderFields['BitsPerSample'] = struct.unpack('<H', bufHeader[34:36])[0]
When I pass in a file I get the following output:
NumChannels: 1
ChunkSize: 78476
BloackAlign: 0
Filename: foo.wav
ByteRate: 32000
BlockAlign: 2
AudioFormat: 1
SampleRate: 16000
BitsPerSample: 16
Format: WAVE
Subchunk1Size: 16
I then try to get the data by doing struct.unpack('<h', self.bufHeader[36:])[0]
but doing this returns a simple integer value 24932
. I'm not allowed to use the wave
library or anything else to do with waves specifically as I Will have to adapt this to other sorts of signals. How can I store and manipulate the actual wave data?
EDIT:
while chunk_reader < stHeaderFields['ChunkSize']:
data.append(struct.unpack('<H', bufHeader[chunk_reader:chunk_reader+stHeaderFields['BlockAlign']]))
Upvotes: 1
Views: 1390
Reputation: 2576
Okay, I'll try to write a complete walk-through.
First, it is a common mistake to treat WAV (or, more likely, RIFF) file as a linear structure. It is actually a tree, with each element having a 4-byte tag, 4-byte length of data and/or child elements, and some kind of data inside.
It is just common for WAV files to have only two child elements ('fmt ' and 'data'), but it also may have metadata ('LIST') with some child elements ('INAM', 'IART', 'ICMT' etc.) or some other elements. Also there's no actual order requirement for blocks, so it is incorrect to think that 'data' follows 'fmt ', because metadata may stick in between.
So let's look at the RIFF file:
'RIFF'
|-- file type ('WAVE')
|-- 'fmt '
| |-- AudioFormat
| |-- NumChannels
| |-- ...
| L_ BitsPerSample
|-- 'LIST' (optional)
| |-- ... (other tags)
| L_ ... (other tags)
L_ 'data'
|-- sample 1 for channel 1
|-- ...
|-- sample 1 for channel N
|-- sample 2 for channel 1
|-- ...
|-- sample 2 for channel N
L_ ...
So, how should you read a WAV file? Well, first you need to read 4 bytes from the beginning of the file and make sure it is RIFF
or RIFX
tag, otherwise it is not a valid RIFF file. The difference between RIFF
and RIFX
is the former uses little-endian encoding (and is supported everywhere) while the latter uses big-endian (and virtually nobody supports it). For simplicity let's assume we're dealing only with little-endian RIFF files.
Next you read the root element length (in file endianness) and following file type. If file type is not WAVE
, it is not a WAV file, so you might abandon further processing. After reading the root element, you start to read all child elements and process those you're interested in.
Reading fmt
header is pretty straightforward, and you have actually done it in your code.
Data samples are usually represented as 1, 2, 3 or 4 bytes (again, in the file endianness). The most common format is a so-called s16_le
(you might have seen such naming in some audio processing utilities like ffmpeg), which means samples are presented as signed 16-bit integers in little endian. Other possible formats are u8
(8-bit samples are unsigned numbers!), s24_le
, s32_le
. Data samples are interleaved, so it is easy to seek to arbitrary position in a stream even for multi-channel audio. Note: this is valid only for uncompressed WAV files, as indicated by AudioFormat == 1. For other formats data samples may have another layout.
So let's take a look at a simple WAV reader:
stHeaderFields = dict()
rawData = None
with open("file.wav", "rb") as f:
riffTag = f.read(4)
if riffTag != 'RIFF':
print 'not a valid RIFF file'
exit(1)
riffLength = struct.unpack('<L', f.read(4))[0]
riffType = f.read(4)
if riffType != 'WAVE':
print 'not a WAV file'
exit(1)
# now read children
while f.tell() < 8 + riffLength:
tag = f.read(4)
length = struct.unpack('<L', f.read(4))[0]
if tag == 'fmt ': # format element
fmtData = f.read(length)
fmt, numChannels, sampleRate, byteRate, blockAlign, bitsPerSample = struct.unpack('<HHLLHH', fmtData)
stHeaderFields['AudioFormat'] = fmt
stHeaderFields['NumChannels'] = numChannels
stHeaderFields['SampleRate'] = sampleRate
stHeaderFields['ByteRate'] = byteRate
stHeaderFields['BlockAlign'] = blockAlign
stHeaderFields['BitsPerSample'] = bitsPerSample
elif tag == 'data': # data element
rawData = f.read(length)
else: # some other element, just skip it
f.seek(length, 1)
Now we know file format info and its sample data, so we can parse it. As it was said, sample may have any size, but for now let's assume we're dealing only with 16-bit samples:
blockAlign = stHeaderFields['BlockAlign']
numChannels = stHeaderFields['NumChannels']
# some sanity checks
assert(stHeaderFields['BitsPerSample'] == 16)
assert(numChannels * stHeaderFields['BitsPerSample'] == blockAlign * 8)
for offset in range(0, len(rawData), blockAlign):
samples = struct.unpack('<' + 'h' * numChannels, rawData[offset:offset+blockAlign])
# now samples contains a tuple with sample values for each channel
# (in case of mono audio, you'll have a tuple with just one element).
# you may store it in the array for future processing,
# change and immediately write to another stream, whatever.
So now you have all the samples in rawData, and you may access and modify it as you like. It might be handy to use Python's array() to effectively access and modify data (but it won't do in case of 24-bit audio, you'll need to write your own serialization and deserialization).
After you've done with data processing (which may involve upscaling or downscaling the number of bits per sample, changing number of channels, sound levels manipulation etc.), you just write a new RIFF
header with correct data length (usually may be computed with a simplified formula 36 + len(rawData)
), an altered fmt
header and data
stream.
Hope this helps.
Upvotes: 3