cembeddedbinary-searchavr-gcclinear-search

Reputation: 3926

avr-gcc: Binary search vs. linear search

I am searching in a few arrays of struct stored in program memory with lengths of about 30-200 with a linear search like this:

Glyph getGlyph(Font font, uint16_t code) {
    for (size_t i = 0; i < font.length; i++) {
        if (pgm_read_word(&font.glyphs[i].code) == code) {
            static Glyph glyph;
            memcpy_P(&glyph, &font.glyphs[i], sizeof (Glyph));
            
            return glyph;
        }
    }
    
    // return question mark if unknown code point
    return getGlyph(font, 0x003f);
}

Assuming I am correctly reading the disassembly, there are 20 instructions per iteration, 3 of them rather expensive as far as I know: brcc, brne and rjmp.

Just evaluating if this could be done more (power) efficient, I implemented a binary search:

Glyph getGlyph(Font font, uint16_t code) {
    
    // https://en.wikipedia.org/wiki/Binary_search_algorithm
    int16_t l = 0;
    int16_t r = font.length - 1;
    
    while (l <= r) {
        uint8_t m = (l + r) / 2;
        uint16_t c = pgm_read_word(&font.glyphs[m].code);
        if (c < code) {
            l = m + 1;
        } else if (c > code) {
            r = m - 1;
        } else {
            // found code point, get and return glyph
            static Glyph glyph;
            memcpy_P(&glyph, &font.glyphs[m], sizeof (Glyph));
            
            return glyph;
        }
    }
    
    // return question mark if unknown code point
    return getGlyph(font, 0x003f);
}

This results in 35 instructions per iteration in the disassembly, of which 5 are what I think rather expensive: brlt, brcc (2x) and rjmp (2x).

Given that the arrays are rather short, I suppose the benefit of the binary search is not a lot and might be compensated by the increased complexity?

Is it worth or does it make sense at all trying to benchmark this?

The program is compiled with -Os. Compiling with -O3 instead as suggested in comments, it seems the amount of instructions is a multiple, but maybe I am just not interpreting the disassembly correctly. It is running on an ATmega328p and I'm currently building it with gcc 5.4.0 and avr-libc 2.0. I tried also with gcc 13 and avr-libc 2.1 but I find the disassembly quite hard to read.

This is just a simple thermometer and hygrometer with an e-ink display, it runs well and I am satisfied with speed and power consumption, though I lack experience and can't compare with anything else. I'm really just curious and want to learn.

Upvotes: 0

Answers (3)

emacs drives me nuts

Reputation: 3888

If you are after fast code, a lookup-table may be the way to go.

There are two flavours. The code below stands in GNU-C99 so that the __flash qualifier is supported. This allow more comprehensible code that's not cluttered up with pgm_read_xxx. If you stick with pgm_read_xxx, you'll have to translate the code.

Also the code uses pointers intead of copying Font's and Glyph's back and forth. The complete, with avr-gcc v5.4 compileable code follows at the end of the answer.

Lookup-Table in Flash

This is the preferred way to go as it does not require precious RAM. The code below assumes that the lookup table is initialized with 0, which is the default for data in static storage. So when we read an id of 0, we have to check whether is's actually glyph[0] that's correct, or some default like glyph with code = '?'

const __flash Glyph*
getGlyph_lookup_flash (const __flash Font *font, uint8_t code)
{
    uint8_t id = font->lookup_flash[code];
    // 0 is also the default initializer value for data in static storage.
    // Check whether we found glyph[0] or not.
    if (id == 0 && code != font->glyphs[0].code)
        id = font->lookup_flash['?'];

    return & font->glyphs[id];
}

Notes:

If you manage to initialize all unused entries of .lookup_flash with the id of ?, the if instruction is no more necessary.
If you are using just one font, then font->glyphs[0].code is known at compile-time. In the sample code at the end of the answer, it could be replaced with 'a'.
As you wrote in a comment, code fits in uint8_t, hence each font will cost additional 256 bytes of program memory.

Lookup-Table in RAM

This is an alternative approach when lookup-tables in program memory are not feasible for some reason. I am actually using this approach with ATmega168/328 for a vector font with variable glyph sizes, where a lookup table in flash is just too painful to pre-compute. When you basically are using codes from 0x20 to 0x7f (ASCII) with some exceptions, then you can handle exceptions by hand and thus reduce the table size to, say, 0x60 bytes.

The code uses the same lookup with different fonts. If a font-switching is detected, the table is reset. For unknown entries, a linear search is used to find the glyph and its id.

#define INVALID_ID 0xff

typedef struct
{
    const __flash Font *pfont;
    uint8_t code_to_id[0x100];
} lookup_t;

static void init_lookup (uint8_t *p)
{
    uint8_t i = 0;
    do
    {
        *p++ = INVALID_ID;
    } while (++i);
}

const __flash Glyph*
getGlyph_lookup_ram (const __flash Font *font, uint8_t code)
{
    static lookup_t lookup_ram;

    // Unknown font: Reset `lookup_ram'.
    if (font != lookup_ram.pfont)
        init_lookup (lookup_ram.code_to_id);

    uint8_t *pid = & lookup_ram.code_to_id[code];

    if (*pid != INVALID_ID)
        // Known id.
        return & font->glyphs[*pid];

    // Unknown id: Do a linear search for glyph with required .code.
    const __flash Glyph *pglyph = & font->glyphs[0];
    for (size_t i = 0; i < font->length; ++i, ++pglyph)
    {
        if (pglyph->code == code)
        {
            *pid = i;
            return pglyph;
        }
    }
    
    // Return question mark if unknown code point.
    pglyph = getGlyph_lookup_ram (font, '?');

    // id for code is unknown. Set it to id of '?' so that the next time
    // we don't have to search all glyphs again.
    *pid = lookup_ram.code_to_id['?'];

    return pglyph;
}

Compileable Code for avr-gcc v5.4

Finally, here is the complete compile-able example for reference and in godbolt Compiler Explorer

I added boilerplate definitions for Font and Glyph.

#include <stdint.h>
#include <stdlib.h>

typedef struct
{
    uint8_t code;
    uint8_t data[4];
} Glyph;

typedef struct
{
    size_t length;
    Glyph glyphs[4];
    uint8_t lookup_flash[0x100];
} Font;

#define ARRAY_SIZE(X) (sizeof (X) / sizeof (*X))

const __flash Font font1 =
{
    .length = ARRAY_SIZE (font1.glyphs),
    .glyphs =
    {
        [0] = { 'a', { 0xaa, 0xaa, 0xaa, 0xaa } },
        [1] = { 'B', { 0xBB, 0xBB, 0xBB, 0xBB } },
        [2] = { '2', { 0x22, 0x22, 0x22, 0x22 } },
        [3] = { '?', { 0x01, 0x23, 0x45, 0x67 } },
    },
    .lookup_flash =
    {
        ['a'] = 0,
        ['B'] = 1,
        ['2'] = 2,
        ['?'] = 3,
    }
};

const __flash Glyph*
getGlyph_lookup_flash (const __flash Font *font, uint8_t code)
{
    uint8_t id = font->lookup_flash[code];
    // 0 is also the default initializer value for data in static storage.
    // Check whether we found glyph[0] or not.
    if (id == 0 && code != font->glyphs[0].code)
        id = font->lookup_flash['?'];

    return & font->glyphs[id];
}

#define INVALID_ID 0xff

typedef struct
{
    const __flash Font *pfont;
    uint8_t code_to_id[0x100];
} lookup_t;

lookup_t lookup_ram;

static void init_lookup (uint8_t *p)
{
    uint8_t i = 0;
    do
    {
        *p++ = INVALID_ID;
    } while (++i);
}

const __flash Glyph*
getGlyph_lookup_ram (const __flash Font *font, uint8_t code)
{
    // Unknown font: Reset `lookup_ram'.
    if (font != lookup_ram.pfont)
    {
        init_lookup (lookup_ram.code_to_id);
        lookup_ram.pfont = font;
    }

    uint8_t *pid = & lookup_ram.code_to_id[code];

    if (*pid != INVALID_ID)
        // Known id.
        return & font->glyphs[*pid];

    // Unknown id: Do a linear search for glyph with required .code.
    const __flash Glyph *pglyph = & font->glyphs[0];
    for (size_t i = 0; i < font->length; ++i, ++pglyph)
    {
        if (pglyph->code == code)
        {
            *pid = i;
            return pglyph;
        }
    }
    
    // Return question mark if unknown code point.
    pglyph = getGlyph_lookup_ram (font, '?');

    // id for code is unknown. Set it to id of '?' so that the next time
    // we don't have to search all glyphs again.
    *pid = lookup_ram.code_to_id['?'];

    return pglyph;
}

int main (void)
{
    const __flash Glyph *pglyph1 = getGlyph_lookup_flash (&font1, '*');
    const __flash Glyph *pglyph2 = getGlyph_lookup_ram (&font1, 'B');

    // If you actually need the glyph in RAM, you can:
    Glyph glyph = *pglyph1;
    glyph = *pglyph2;

    return 0;
}

Upvotes: 1

user5329483

Reputation: 1272

After checking your github project and after you provided the constraints (ATmega328p) I think I understood your issue.

You have to conserve every bit of your precious 2048 bytes RAM of this ATmega. Placing a lookuptable[256] in RAM is no option due to available memory.

Transform your font table to this:

typedef struct
{
    uint8_t height;    /* Height of all glyphs of this font. */
    Glyph Glyphs[256]; /* Glyphs for all 256 charcodes*/
} newFont; /*sitting in PROGMEM*/

After the data prepared for use in that way, the final getGlyph() becomes an one-liner:

void getGlyph(Glyph *pDest, newFont const/*PROGMEM*/ * pFont, uint8_t code)
{
    memcpy_P(pDest, pFont->Glyphs[code], sizeof(*pDest));
}

Upvotes: 1

Lundin

Reputation: 213276

There are lots of big bottlenecks in this program and they aren't necessarily related to the searching as such. To address them you'll rather need a code review:

Why are you passing structs etc by value? That's extremely inefficient on 8 bitters, even in case of small structs.
When you return a glyph, you first copy it from somewhere (flash) into the static variable (.bss) and then copy it again onto the function call stack and then finally you copy it again in the caller. This is massively inefficient when you could just have returned a const pointer pointing at the original item, or alternatively if these need to be modified in run-time, just copy it once - probably still return a const pointer and let the caller copy the data.

(Might be necessary to use some Harvard architecture tweaks to access flash data directly - whatever your compiler uses to handle that, "PROGMEM" macro or some such.)
Avoid needlessly large counters and indices. If you don't plan to go beyond value 255 then don't use uint16_t. And pretty much never use size_t which will be a 32 bit type and therefore horribly slow.
Recursive functions should pretty much never be used in any context and certainly never in embedded systems. That part must be rewritten as a loop - I don't even understand why you are using recursion at all.
Using standard lib bsearch is not necessarily that much slower than rolling out your own version manually. Benchmark with an oscilloscope.
In general, binary search is a pretty sound algorithm for 8-bitters. We neither need to worry about branching nor alignment, just the raw amount of instructions generated. On the other hand you literally picked one of the slowest CPUs still in production, so why worry about performance.

Overall I think you would benefit from picking a modern MCU instead, such as a Cortex M. Programming old 8-bitters in C is actually pretty hard and it's easy to make mistakes. AVR is 1990s technology, there's not many reasons for using them any longer. (They are a pretty good choice for learning assembler but that's about it.)

Upvotes: 1

avr-gcc: Binary search vs. linear search

Answers (3)

Lookup-Table in Flash

Lookup-Table in RAM

Compileable Code for avr-gcc v5.4

Related Questions