Reputation: 91
I'm working on a project that involves receiving an I2S signal using two data lines, each carrying one channel of audio, sharing the same word and bit clock. To do this, I'm using the SAI (Serial Audio Interface) on an STM32F7 MCU. It was designed for this, as each SAI instance has two "sub-blocks", A and B, (as referred to within the reference manual) that can be synchronized together to share the same bit and word select clock.
To receive my I2S signal, I have both the A and B sub-blocks configured to receive 16-bit I2S words and share the same clocks (A is asynchronous, B is synchronous to A). This works fine and samples are captured in sync using DMA, but the problem is that they end up in two separate buffers in memory: (each cell is a byte, with two bytes per sample; blue is SAI A and green is SAI B)
However, I'd like each 2-byte sample from each SAI sub-block to be arranged in memory interlaced, so one sample is from A, the next is from B, and so on like audio typically is stored:
It's trivial for me to write a for loop to interlace the two buffers together, but that burns a lot of CPU time that could be spent doing other things. I'm wondering if there's some hardware trick or feature I've overlooked that could help me achieve this. I'm honestly surprised that the SAI feature doesn't have any kind of native method of doing this, but I've looked through that whole section of the reference manual and didn't see anything that would help me.
If the DMA feature of the STM32 allowed the output memory pointer to be incremented by a different size than the data being transferred (for example, transferring only two bytes but incrementing the output pointer by four) this would be easy, but it doesn't appear to unless I use the FIFO feature of DMA. Using the FIFO mode, I could make it buffer one 16-bit sample into a 32-bit buffer and write that buffer to the output when it's half full, effectively "making space" by separating each 16-bit sample by another 16-bits, as shown below. However, this only works for one channel since it appears to overwrite all 32 bits, even if they aren't all used. The CPU would still have to work to transfer the other channel into that unused space.
Additionally, I came up with a somewhat janky solution using the 2D DMA feature which is intended to be used to render images. Using the above method for one channel while using a normal DMA for the other, I could use the 2D DMA feature to copy the second channel into the unused space made for it in the first channel's buffer. To do this, I effectively configured the 2D DMA to transfer a 16bpp "image" (as far as it's concerned) of size 1x<buffer size>
into a canvas "image" that's 2x<buffer size>
, making it skip every other output "pixel" (which is actually a sample). It's janky, but it worked on my prototype board. Unfortunately, however, the STM32 I'm actually going to be using for this project doesn't support this feature so it isn't useful.
Is there any hardware trick I can use to interlace two buffers of 16-bit samples into another buffer so that each sample from both buffers are one after another, like audio is typically stored? Ideally this would be done without the CPU needing to do much work. If not, perhaps there's some specific assembly instructions that could get this done much faster than a simple C for loop?
Thanks, I really appreciate it.
Upvotes: 2
Views: 821
Reputation: 1
I had a similar problem and it really came from a wrong DMA transfer setting, 32 bits (the default in CubeMX) instead of 16.
Upvotes: 0