Reputation: 157
I'd like some help improving the efficiency of my circular buffer code.
I had a look around stackoverflow and found that (nearly) all of the topics on circular buffers are about the uses of such a buffer or the basic implementation of a circular buffer. I really need information about how to make it super efficient.
The plan is to use this buffer with the STM32F4 microcontroller which has a single precicion FPU. I plan to make heavy use of especially the write() and readn() functions. We're literally talking a few million calls a second here so shaving of a few clock cycles here and there is really going to make a difference.
I'll put the most important bits of code here, the full buffer code is available via http://dl.dropbox.com/u/39710897/circular%20buffer.rar
Can anyone provide me with a few pointers on how to improve the efficiency of this buffer?
#define BUFF_SIZE 3 // buffer size set at compile time
typedef struct buffer{
float buff[BUFF_SIZE];
int readIndex;
int writeIndex;
}buffer;
/********************************\
* void write(buffer* buffer, float value)
* writes value into the buffer
* @param buffer* buffer
* pointer to buffer to be used
* @param float value
* valueto be written in buffer
\********************************/
void write(buffer* buffer,float value){
buffer->buff[buffer->writeIndex]=value;
buffer->writeIndex++;
if(buffer->writeIndex==BUFF_SIZE)
buffer->writeIndex=0;
}
/********************************\
* float readn(buffer* buffer, int Xn)
* reads specified value from buffer
* @param buffer* buffer
* pointer to buffer to be read from
* @param int Xn
* specifies the value to be read from buffer counting backwards from the most recently written value
* i.e. the most recently writen value can be read with readn(buffer, 0), the value written before that with readn(buffer, 1)
\********************************/
float readn(buffer* buffer, int Xn){
int tempIndex;
tempIndex=buffer->writeIndex-(Xn+1);
while(tempIndex<0){
tempIndex+=BUFF_SIZE;
}
return buffer->buff[tempIndex];
}
Upvotes: 10
Views: 15300
Reputation: 12943
As "Oli Charlesworth" suggested - you'd be able to simplify things if your buffer size is a power of 2. I'd like to write the read/write function bodies, so that the intent is more clear.
#define BUFF_SIZE (4U)
#define BUFF_SIZE_MASK (BUFF_SIZE-1U)
struct buffer {
float buff[BUFF_SIZE];
unsigned writeIndex;
};
void write(struct buffer *buffer, float value) {
buffer->buff[(++buffer->writeIndex) & BUFF_SIZE_MASK] = value;
}
float readn(struct buffer *buffer, unsigned Xn){
return buffer->buff[(buffer->writeIndex - Xn) & BUFF_SIZE_MASK];
}
Some explanations. Note that there's no branching (if
) at all. We don't limit the array index to the array bounds, instead we're AND-ing it with the mask.
Upvotes: 16
Reputation: 1699
The accepted answer contains code that is incorrect and will invoke undefined behavior. Correction below:
#define BUFF_SIZE (4U)
#define BUFF_SIZE_MASK (BUFF_SIZE-1U)
struct buffer {
float buff[BUFF_SIZE];
unsigned writeIndex;
};
void write(struct buffer *buffer, float value) {
buffer->buff[(++buffer->writeIndex) & BUFF_SIZE_MASK] = value;
}
float readn(struct buffer *buffer, unsigned Xn){
return buffer->buff[(buffer->writeIndex - Xn) & BUFF_SIZE_MASK];
}
The error in the original answer was to assume that 'int' will wrap around. Using binary masks with int is also unwise.
Upvotes: 1
Reputation: 214040
Keeping track of the start and the end of the circular buffer with pointers, is likely a bit faster than array indexing, since the address will computed in runtime in case of the latter. Try to replace readIndex and writeIndex with float*
instead. The code would then be
*buffer->writeIndex = value;
buffer->writeIndex++;
if(buffer->writeIndex == buffer + BUFF_SIZE)
buffer->writeIndex=buffer->buff;
buffer + BUFF_SIZE
will still be a constant expression and the compiler will translate that to a fixed address at compile time.
Upvotes: 2
Reputation: 28535
This may not be seem elegant but is efficient. Accessing structure elements through the pointer is taking up a lot of instructions. Why not remove the structure altogether and make buffer
and writeIndex
as global variables? This will considerably decrease the size of your readn
and write
functions.
I tried in gcc and here is the output with and without the structure
With Structure
_write:
pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %ecx
movl 8(%ebp), %eax
movl 16(%eax), %edx
movl 12(%ebp), %eax
movl %eax, (%ecx,%edx,4)
movl 8(%ebp), %eax
incl 16(%eax)
movl 8(%ebp), %eax
cmpl $3, 16(%eax)
jne L1
movl 8(%ebp), %eax
movl $0, 16(%eax)
L1:
popl %ebp
ret
Without Structure. ie Making buffer
and writeIndex
as global
_write:
pushl %ebp
movl %esp, %ebp
movl _writeIndex, %edx
movl 8(%ebp), %eax
movl %eax, _buff(,%edx,4)
incl _writeIndex
cmpl $3, _writeIndex
jne L1
movl $0, _writeIndex
L1:
popl %ebp
ret
Upvotes: 2
Reputation: 272557
If you can make your buffer size a power-of-2, then the check against zero can be replaced with unconditional bit-masking. On most processors, this should be faster.
Upvotes: 12