Reputation: 4276

Problems decoding streamed mp3 data using JLayer

Im trying to use the JLayer java lib to decode an mp3 data stream. I have a callback which is called asynchronously when the next chunk of mp3 data has arrived from the network. Each chunk that arrives contains 4 mp3 frames in byte[] format. This data is passed to the short[] decode(byte[] mp3_data) to be decoded, and the output is a short[] pcm audio buffer. The buffer is appended to inside the while loop using the concatArray() method, until all the mp3 frames are exhausted. The problem I am having is the first 2 or sometimes 3 frames of data return a pcm buffer filled with zeros, where as the last 2 or 1 return valid 16 bit audio values.

   public short[] decode(byte[] mp3_data) throws IOException {

        SampleBuffer output = null;
        InputStream inputStream = new ByteArrayInputStream(mp3_data);
        short[] pcmOut = {};
        try {
            Bitstream bitstream = new Bitstream(inputStream);
            Decoder decoder = new Decoder();
            boolean done = false;
            int i = 0;
            while (! done) {
                Header frameHeader = bitstream.readFrame();
                if (frameHeader == null) {
                    done = true;
                } else {
                    output = (SampleBuffer) decoder.decodeFrame(frameHeader, bitstream);
                    short[] next = output.getBuffer();
                    pcmOut = concatArrays(pcmOut, next);
                }

                bitstream.closeFrame();
                i++;
            }
            return pcmOut;

        } catch (BitstreamException e) {
            throw new IOException("Bitstream error: " + e);
        } catch (DecoderException e) {
            Log.w(LOG_TAG, "Decoder error", e);
        }
        return null;
    }


    short[] concatArrays(short[] A, short[] B) {

        int aLen = A.length;
        int bLen = B.length;
        short[] C= new short[aLen+bLen];

        System.arraycopy(A, 0, C, 0, aLen);
        System.arraycopy(B, 0, C, aLen, bLen);

        return C;
    }

LOG OUTPUT

Frame 0 len: 2304, First 10 samples: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Frame 1 len: 2304, First 10 samples: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Frame 2 len: 2304, First 10 samples: [-4128, -4158, -4252, -3934, -4452, -3775, -4799, -3762, -5430, -4092]
Frame 3 len: 2304, First 10 samples: [-18050, -19711, -18184, -19753, -18143, -19595, -17046, -18362, -14773, -15933]

Frame 0 len: 2304, First 10 samples: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Frame 1 len: 2304, First 10 samples: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Frame 2 len: 2304, First 10 samples: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Frame 3 len: 2304, First 10 samples: [2455, 2345, 5253, 5129, 6716, 6442, 7475, 6866, 8461, 7444]

Frame 0 len: 2304, First 10 samples: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Frame 1 len: 2304, First 10 samples: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Frame 2 len: 2304, First 10 samples: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Frame 3 len: 2304, First 10 samples: [951, 1322, 1497, 1929, 1615, 2198, 1320, 2134, 1040, 2114]

Frame 0 len: 2304, First 10 samples: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Frame 1 len: 2304, First 10 samples: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Frame 2 len: 2304, First 10 samples: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Frame 3 len: 2304, First 10 samples: [-10213, -9578, -11691, -10867, -13686, -12770, -14837, -13874, -15619, -14574]

As you can see printing out the pcm buffers for each 4 frame mp3 chunk, you can see that the first 2 - 3 buffers are filled with zeros. Does anyone have any expreince with JLayer who can see an obvious problem with my method?

Upvotes: 2

Answers (2)

Yozek

Reputation: 101

I've been playing a little with mp3 decoding using JLayer and I'm just facing your same issue: for each frame I get lots of zeros and then several non-zeros pcm samples.

I suppose the decodeFrame() method should return the real pcm samples decoded because it has already processed, requantized, huffman-decoded, polyphase resynthesized the encoded for me.

This way the total pcm samples are more that they should so I decided to strip-off all the pcm zeros samples and I write-out the samples in wav format. I know it's a bit 'weird' but.. now it sounds really as it should !!

The song I decoded is a CBR format, mono channel just to keep the stuff simpler.

I thought that maybe all those zeros have something to do with bit-reservoir, so if the song and the psycoacustic model used doesn't really need them, they're set to zero. Then I made other tests.

What I've argued is that if each Layer 3 frame is decoded in 2304 pcm samples, in a mono song maybe only the first half is non-zero, while the seconds half are all zero. But if I use a stereo mp3...almost all samples are non-zeros, except obviously at the very beginning of the song.

So it seems that this 'issue' only arises with mono encoded mp3. With stero mp3 I can get all the correct pcm samples, in a mono mp3 I just need to get the first half of the decoded pcm samples per frame.

But isn't this a waste of space for an audio compression algorithm ? Maybe I'm still loosing something...

Hope this could help a bit...

EDIT

For waht I can see, the channels are interleaved in the frame: for 2-channels mp3, the 2304 pcm samples decoded are:

L[0],R[0],L[1],R[1],L[2],R[2],.......,L[1152],R[1152]

The ouptut wav file generated sounds now much better than before.

Upvotes: 0

Durandal

Reputation: 20059

What is the problem? First, many mp3's will obviously start with silence. Second, due to the nature of PCM synthesis it takes a while to fill the polyphase synthesis filter bank, so the very first samples will very likely be zeros, the synthesis filter starts out with all zeros in its 16 banks.

Look at the entire frame to decide if its silent, not at 10 samples.

EDIT: You apparently are not familiar with how MP3 works internally, so I'll elaborate a bit on the basics.

An MP3 frame contains the header word (tells about bit rate, sample rate and stereo type), and some control information. The majority of the frame consists just of packed data. Opposite to what is mostly implied when spoken about MP3, the packed data does not belong entirely to that single frame. A frame can "borrow" packed data space from its predecessors, and it can also carry data belonging to the following frame(s). CBR (constant bit rate) just tells that all the frames are of equal size, but due to the borrowing from previous frames, particuarly complicated frames may be allocated more bits by borrwing space from preceeding frames (this decision is made by the encoder when it creates the stream). VBR just adds the additional possibility to also vary the frame size, technically CBR streams are already able to allocate a variable amount of bits per frame, just within tighter limits than VBR.

To decouple the decoding from the unevenly allocated frame data, the decoder feeds the packed data it receives with each frame into a FIFO buffer called "Bit Reserve" that basically takes care that all data borrowed from previous frames is remembered until it is requested by the decoding pipeline.

Data from the bit reserve is then huffman decoded, processed through some complex math to produce time-frequency samples. To transform those into PCM, they are fed into the synthesis filter. The synthesis filter remembers each time-frequency sample for a fixed period of time (well technically steps, the wall-clock time varies with the sample rate) into the past in its "banks" (each time-frequency sample influences multiple PCM samples), with the oldest being pushed out by the newest.

This entire decoding pipeline introduces quite some latency. Seeking inside an MP3 properly is non-trivial due to the latency of the pipeline and further complicated by the bitreserve borrowing mechanism.

Upvotes: 3

Problems decoding streamed mp3 data using JLayer

Answers (2)

Related Questions