Reputation: 6414
Given an expensive function of n variables that returns a scalar value:
f(x1, x2, ..., xn) = y
If I wished to memoize this function in a relational database, what kind of table structure should I use, and what data modelling methodologies apply?
(Related but from a different angle: What kind of data model models function parameters and results?)
Upvotes: 0
Views: 173
Reputation: 52137
First of all, a DBMS is not necessarily the best choice to handle memoization. This approach is justified only if the number of results is too huge to fit in RAM or results need to be persisted over a long period of time or need to be reused across multiple, possibly concurrent, clients.
For each function, create a separate table, with columns that correspond to function inputs and result. Create a PK on inputs.
Before evaluating the function (on value1
, value2
, value3
...), do a:
SELECT result
FROM function_table
WHERE
input1 = :value1
AND input2 = :value2
AND input3 = :value3
...
(:
denotes bound parameter, some DBMSes may use a different prefix)
By using separate tables and static tailor-made queries with bound parameters for each function, you can take advantage of query preparation for better performance.
Also, consider clustering the table (if your DBMS supports it), to get the result directly from the B-Tree structure and avoid the need for the table heap lookup.
Upvotes: 1
Reputation: 95642
Depending somewhat on the value of 'n', you can probably model it like this. Assume that the value of 'n' is 137.
create table expensive_function_of_n_vars (
x1 integer not null,
x2 integer not null,
...
x137 integer not null,
primary key (x1, x2, ..., x137),
result integer not null
);
Under normal conditions, I'm reluctant to store a result without including a CHECK() constraint to make sure it's the correct result. In your case, that might not be practical, but you should give it some thought anyway.
This assumes that each column carries some kind of meaning. That is, I'm assuming that, in the real problem domain, each of these columns has a name more meaningful than "x3".
For example, in the article you referenced, the OP uses "height", "width", and "depth". In some applications, those dimensions are not interchangable--you can unambiguously identify which dimension on the real-world object is height, which is width, and which is depth. (One example might be a shipping container on a pallet, where height is obvious, width is the edge a forklift is expected to fit under, and depth is the remaining dimension.) In other applications, they're interchangable, which means you're liable to find "duplicate" primary keys like {2, 3, 5} and {2, 5, 3}. In cases like that, you might want to order the arguments from lowest to highest, and use CHECK() constraints to make sure they're ordered.
This is just straightfoward normalization, with the caveat that, in this case, you're starting in 6NF, I think, so there's not much to do.
Upvotes: 1