yadda
yadda

Reputation: 197

Prolog: optimizing global structures for performance

I have written a prolog program and now am trying to optimize it for performance (yes, this use case actually needs it).

First some background on how the program was originally structured, so you know where I've already been.

The system keeps customer (user) orders in the logic base, which are dynamically asserted into the logic base as they come in (and dynamically removed once processed using retract). Originally the orders were structured like so:

order(RegionID, UserID, UserBalance, OrderID, ProductID, Price, ...) .
order(RegionID, UserID, UserBalance, OrderID, ProductID, Price, ...) .
...
order(RegionID, UserID, UserBalance, OrderID, ProductID, Price, ...) .

I liked this just fine, however during testing, I populated the system with 50,000 orders and found that it took inordinately long to PROCESS (on the order of a few minutes - it needed to be better). I profiled and found the most time was spent going through the logic base scooping up the orders for processing, so decided to try another scheme.

This makes sense because particular users are tied to particular regions:

order(RegionID, [ (UserID, UserBalance, OrderID, ProductID, Price, ...), (UserID, UserBalance, OrderID, ProductID, Price, ...), ...]) .
order(RegionID, [ (UserID, UserBalance, OrderID, ProductID, Price, ...), (UserID, UserBalance, OrderID, ProductID, Price, ...), ...]) .
...
order(RegionID, [ (UserID, UserBalance, OrderID, ProductID, Price, ...), (UserID, UserBalance, OrderID, ProductID, Price, ...), ...]) .

What I am doing here is storing a long list of user orders for each region. To test this, I made the lists within the order structures 50,000 in length (50,000 orders). This performed much better than the original scheme in PROCESSING the orders (25% - 30% of the original time); however in ADDING orders to the system, it performs much worse by at least an order of magnitude if not more.

The order adding procedure is quite simple. I simply retract the order structure with an instantiated the RegionID, then reassert with an additional order tacked on to the head (something like this):

retract( order(california, OldOrders ) ).
assert ( order(california, [ NewOrder | OldOrders ] ) ).

I would have presumed this to be reasonably fast as I'm just adding something to the head, but it isn't. My guess is there is a lot of copying of the long list going on behind the scenes.

My question is simply how to optimize this more for speed. You may suggest a different structuring of the data, a different algorithm, a different mechanism for storing this stuff (I only know assert/retract, but different prologs may have more exotic mechanisms?), or whatever you want. Keep in mind that with any suggestions, I don't want to go backwards on order PROCESSING (vs. ADDING).

I am currently using Eclipse (the prolog, not the IDE), however I could easily switch to XSB, yap, or any other free prolog if your suggestion requires it. Just note that we need to stick to faster prologs rather than slower ones like SWI.

Thanks for any suggestions.

Upvotes: 2

Views: 140

Answers (1)

Daniel Lyons
Daniel Lyons

Reputation: 22803

I think your principal problem is that you're not benefiting from indexing because your order processing queries begin with the user ID instantiated and that is not the first argument of the term. Probably you're doing a query with two or three of these arguments instantiated, but Prolog is only able to winnow down to instances of order/N that have the same region ID based on the argument order.

For fun, you might try flipping things around like so:

order(UserID, RegionID, UserBalance, OrderID, ProductID, Price, ...) .

or even

order(OrderID, UserID, RegionID, UserBalance, ProductID, Price, ...) .

and just see what effect those alternatives have on performance. (I might be wrong; Eclipse is the one I know the least about.) Most Prologs index on the first argument only in the absence of other information. I don't know if Eclipse has an index declaration like SWI, but in the case of SWI, you used to be able to simply say something like:

:- index(order/7, [1,2]).

(assuming 7 is the correct arity) and it would index on the first two arguments which would be enough to greatly improve your "scoop up" time. Nowadays this is ignored and a much more complex system is used instead which might mean you see performance benefits just by trying it in SWI. Might be worth a look, since you are open to it. Eclipse may have something similar to this.

As a portable option, you can build your own indexes using term_hash/2. I've never used this option myself. The basic idea as I understand it is to bundle up all the values you might query on in a single term and then generate a hash from that term, and use that to build a new relation so that the hash value is the first argument. I suspect this option would look something like this (untested):

:- initialization rebuild_index/0.
:- dynamic order_by_order_id_and_user_id/2.

rebuild_index :-
    order(OrderId, UserId, ...),
    term_hash(order(OrderId, UserId), Hash),
    assertz(order_by_order_id_and_user_id(Hash, order(OrderId, UserId, ...)).

find_order_by_order_id_and_user_id(OrderId, UserId, Order) :-
    term_hash(order(OrderId, UserId), Hash),
    order_by_order_id_and_user_id(Hash, Order).

This will of course only work if your Prolog is going to generate indexes for dynamic predicates.

If you were using SWI-Prolog I would also (politely) suggest moving the database to an RDBMS and using the ODBC interface to query it. It's much easier to optimize performance in the database (I'd personally rather just issue CREATE INDEX orders_by_order_id_and_user_id ON orders (order_id, user_id) and see performance "magically" improve than have to write a bunch of boilerplatey access code like the above) and then you get the benefit of the RDBMS as "integration technology" rather than simply persistence/storage technology. I don't know if other Prolog's have similar capabilities for accessing databases.

Whatever you find out that works, please come back and submit as an answer, I think we'd all benefit from knowing a little bit more about what the performance ramifications are of various alternatives.

Upvotes: 1

Related Questions