Jon Romero
Jon Romero

Reputation: 4090

Getting lots of data from Mnesia - fastest way

I have a record:

-record(bigdata, {mykey,some1,some2}).

Is doing a

mnesia:match_object({bigdata, mykey, some1,'_'})

the fastest way fetching more than 5000 rows?

Clarification: Creating "custom" keys is an option (so I can do a read) but is doing 5000 reads fastest than match_object on one single key?

Upvotes: 1

Views: 961

Answers (3)

Muzaaya Joshua
Muzaaya Joshua

Reputation: 7836

The statement Fastest Way to return more than 5000 rows depends on the problem in question. What is the database structure ? What do we want ? what is the record structure ? After those, then, it boils down to how you write your read functions. If we are sure about the primary key, then we use mnesia:read/1 or mnesia:read/2 if not, its better and more beautiful to use Query List comprehensions. Its more flexible to search nested records and with complex conditional queries. see usage below:

-include_lib("stdlib/include/qlc.hrl").
-record(bigdata, {mykey,some1,some2}).


%% query list comprehenshions
select(Q)->
    %% to prevent against nested transactions
    %% to ensure it also works whether table
    %% is fragmented or not, we will use
    %% mnesia:activity/4

    case mnesia:is_transaction() of
        false -> 
            F = fun(QH)-> qlc:e(QH) end,
            mnesia:activity(transaction,F,[Q],mnesia_frag);
        true -> qlc:e(Q)
    end.

%% to read by a given field or even several
%% you use a list comprehension and pass the guards
%% to filter those records accordingly

read_by_field(some2,Value)->
    QueryHandle = qlc:q([X || X <- mnesia:table(bigdata),
                        X#bigdata.some2 == Value]),
    select(QueryHandle).

%% selecting by several conditions

read_by_several()->
    %% you can pass as many guard expressions

    QueryHandle = qlc:q([X || X <- mnesia:table(bigdata),
                            X#bigdata.some2 =< 300,
                            X#bigdata.some1 > 50
                            ]),
    select(QueryHandle).

%% Its possible to pass a 'fun' which will do the 
%% record selection in the query list comprehension

auto_reader(ValidatorFun)->
    QueryHandle = qlc:q([X || X <- mnesia:table(bigdata),
                        ValidatorFun(X) ==  true]),
    select(QueryHandle).

read_using_auto()->
    F = fun({bigdata,SomeKey,_,Some2}) -> true;
            (_) -> false
        end,
    auto_reader(F).

So i think if you want fastest way, we need more clarification and problem detail. Speed depends on many factors my dear !

Upvotes: 0

Devin Cheung
Devin Cheung

Reputation: 11

Mnesia is more a key-value storage system, and it will traverse all its records for getting match.

To fetch in a fast way, you should design the storage structure to directly support the query. To Make some1 as key or index. Then fetch them by read or index_read.

Upvotes: 1

Jr0
Jr0

Reputation: 2173

I'm curious as to the problem you are solving, how many rows are in the table, etc., without that information this might not be a relevant answer, but...

If you have a bag, then it might be better to use read/2 on the key and then traverse the list of records being returned. It would be best, if possible, to structure your data to avoid selects and match.

In general select/2 is preferred to match_object as it tends to better avoid full table scans. Also, dirty_select is going to be faster then select/2 assuming you do not need transactional support. And, if you can live with the constraints, Mensa allows you to go against the underlying ets table directly which is very fast, but look at the documentation as it is appropriate only in very rarified situations.

Upvotes: 2

Related Questions