Bri Bri
Bri Bri

Reputation: 2278

What kind of implementation can I use for a static associative array on a vintage system with very limited resources?

I'm working on a project written in C that's going to be deployed on vintage systems with very limited resources. The target system has at minimum a 16 MHz Motorola 68030 processor with a 256 byte L1 cache, and my application has a limit of 1 MB of memory, so everything I do needs to be as CPU and memory efficient as I can manage.

Further, I need to keep heap memory allocations to a minimum, because they're relatively slow and can result in memory fragmentation. Memory alignment is important too, which is why I'm primarily dealing with values that are 2 or 4 bytes, rather than individual bytes.

In this app, I need several static associative arrays. Each array has the same key and value type: the key is a pair of two byte integers (which could be represented as a single four byte integer), and the value is a two byte integer. The data in all of these arrays is determined at compile time, and it will never change during runtime. So there's no need to add or remove keys, or change any values. The size of these arrays is between 100 and 1000 entries.

Given these specifications, how might I go about implementing such an array? What are my options in terms of trading off CPU cycles for memory, or vise versa?

Research so far:

One idea was to sort the data ahead of time and use a binary search to find a value, but I'd prefer to find something that runs in constant time. This has the advantage though of being easy to implement and calculate ahead of time.

I next looked into a hash table using a minimal perfect hash function. This makes sense because the data is static, it could be stored efficiently in contiguous memory, and compile time calculations can be done to determine the hash function. However, after doing more research, I found that the way such hash functions are usually implemented are not well suited for this use case.

That leaves non-minimal perfect hash functions, and the advantage of such a thing is that only the data needs to be stared in the hash table, not the keys as well. So with an efficient enough table, this looks like this might be the most memory efficient option available. But there's still the question of the hash function, and whether there's a good one that can be used on an old processor, especially since the modulo operator will be relatively slow. However, if it still ends up being faster on average than a binary search, this might be the best approach.

Upvotes: 3

Views: 179

Answers (1)

Simon Goater
Simon Goater

Reputation: 1908

As we know, c is good for using memory efficiently, but there comes a point where there have to be trade-offs between CPU cost and memory space. If you can't get a perfect hash function that works to your satisfaction, you can use the hash table method I often use which works as follows. It doesn't require that the data be static, but works best when you know a good upper bound for the number of elements to insert in the hash as there's no quick way to increase its capacity. On modern x86_64, it's feasible to get ~20m/s inserts or selects even when the array is much bigger than L3 cache.

Suppose you have n items, each of which occupied s bytes. For the hash function, if I know how many elements are to be put in it, I often just use modulus of some prime p > n (e.g. p=nextprime(n)) to get the index of the array to start searching/inserting from. Make an array of 2ps bytes (you can go smaller, say 1.5 but then there are a LOT more collisions making look-ups slower) and fill it with some null value. To find something in the array or the place to insert a new object, you just use linear search from the position given by the hash function 2s(key % p), with wrapping, until you find your object or a null. If you use an array of 2ps size, most of the time it will find the object straight away, or when the objects are small, will have to search within the same cache line for it.

There are lots of options, but only you know how much you want to prioritise CPU work vs space.

If you use a struct for your elements, I suggest you check that if you don't have 1 or 2 byte alignment, you could waste a lot of space due to padding. The hash method I described allows you to avoid padding.

To avoid the high cost of multiplication and division on many old or embedded systems, you could instead do something like the following. For 2^(m-1) < p < 2^m make a mask = 2^(m+1) - 1 and then calculate the hash as

h = key & mask

h = (h >= p ? h - p : h)

If this has too many collisions, you can make the first step a bit more complicated to make use of all the bits of the key. Something like,

h = (key + (key >> m) + (key >> 2m) +...) & mask

Upvotes: 3

Related Questions