mingchaoyan
mingchaoyan

Reputation: 8616

Erlang ordered and key-value data struct

I want to implement a real-time score ranking(ordered). I hope I can get every player's score fast(key-value).

Here player_id is key and score is value.

I try to use ordered-set type ETS to store the list of all the player's score, But ordered set orders after the key not the value.

Does Erlang/OTP have some other data struct that can solve my problem?

Upvotes: 6

Views: 2419

Answers (3)

Ray
Ray

Reputation: 83

There are three solutions:

  • ets ordered-set

  • RAM-only mnesia table with secondary index

  • nif

1) ordered-set, records in the ets table should be {score, player_id}, not {player_id, score}, so that they're sorted by score. To get the score for a player, just use match. Although match needs to scan the whole table, it's still pretty fast.

Profiling: assuming there are 10k players, insert 10k records into ets table, then ets:match_object(Table, {'_', PlayerID}). Each match takes 0.7 to 1.1ms, which is more than good enough in most cases. (CPU: i5 750)

  1. match_object is slightly faster than match;
  2. select with a match specification is slower than match, probably because the selection here is quite simple that fun2ms's overhead outweighs its gain. Note that select is generally preferred over match.

2) mnesia table, make it RAM-only and use an secondary index for player_id:

mnesia:create_table(user, [{type, ordered_set}, {attributes, record_info(fields, user)}, {index, [playerid]}])

Average get time using mnesia:read in mnesia:transaction is 0.09ms. But, inserting 10k records is about 180 times slower than its ets counterpart (2820ms vs 15ms).

If neither ets nor mnesia satisfies your performance requirement, resorting to C with nif could be another choice. But personally I think over-optimizing is not worthy here unless it's really your bottleneck.

Upvotes: 3

Pascal
Pascal

Reputation: 14042

What I understand is that you need to maintain a list of pair (Key,Score) with which you want to perform:

  • frequent update of the score,
  • frequently display a complete or partial view of the list sorted by score

I propose you to store your data into 2 different ets:

  • The first one for fast access to the key is a set where you store the Key in the first field, and the Score in the second field.
  • The second is an ordered set where you store a tuple {Score,Key} as key, and no value. This should guarantee the uniqueness of each record a maintain the list ordered by score.

When you need to display scores, the ordered set is sufficient.

when you need to update a score you should use the ets to get the value of the previous score for Key, delete the record {PrevScore,Key} and insert {NewScore,Key} in the ordered set and simply update the first ets with new value.

I tested this solution on a 1 000 000 item list, the update of 1 score takes an average of 3µs on my laptop (windows XP, core i5, 2Go ram ,all disks full and many app running in background). the code I used :

note I used private tables and a single server to access them in order to guarantee the consistency of the 2 tables, so concurrent processes can access to the server (named score) without conflict, the requests will be executed in the order they arrive to the server. It could be possible to answer in priority to any get(N) request with 2 receive blocks.

here is the test result on my home PC (ubuntu 12.04, 8gb ddr, AMD phenom II X6)...

[edit] I modified the update/2 function in order to be synchronous, so the measure is now significant, and the result more understandable. It appears that for table smaller than 10000 records, the overhead of ets management and message passing is preponderant. enter image description here

-module (score).

-export ([start/0]).
-export ([update/2,delete/1,get/1,stop/0]).

score ! {update,M,S,self()},
    receive
        ok -> ok
    end.

delete(M) ->
    score ! {delete,M}.

get(N) ->
    score ! {getbest,N,self()},
    receive
        {ok,L} -> L
    after 5000 ->
        timeout
    end.

stop() ->
    score ! stop.


start() ->
    P = spawn(fun() -> initscore() end),
    register(score,P).


initscore() ->
    ets:new(score,[ordered_set,private,named_table]),
    ets:new(member,[set,private,named_table]),
    loop().

loop() ->
    receive
        {getbest,N,Pid} when is_integer(N), N > 0 ->
            Pid ! {ok,lists:reverse(getbest(N))},
            loop();
            {update,M,S,P} ->
                    update_member(M,S),
                    P ! ok,
            loop();
        {delete,M} ->
            delete_member(M),
            loop();
        stop ->
            ok
    end.



update_member(M,S) ->
    case ets:lookup(member,M) of
        [] -> 
            ok;
        [{M,S1}] ->
            ets:delete(score,{S1,M})
    end,
    ets:insert(score,{{S,M}}),
    ets:insert(member,{M,S}).

delete_member(M) ->
    case ets:lookup(member,M) of
        [] -> 
            ok;
        [{M,S}] ->
            ets:delete(score,{S,M}),
            ets:delete(member,M)
    end.

getbest(N) ->
    K= ets:last(score),
    getbest(N-1,K,[]).

getbest(_N,'$end_of_table',L) -> L;
getbest(0,{S,M},L) -> [{M,S}|L];
getbest(N,K={S,M},L) ->
    K1 = ets:prev(score,K),
    getbest(N-1,K1,[{M,S}|L]).

and the code for test:

-module (test_score).

-compile([export_all]).

init(N) ->
    score:start(),
    random:seed(erlang:now()),
    init(N,10*N).

stop() ->
    score:stop().

init(0,_) -> ok;
init(N,M) ->
    score:update(N,random:uniform(M)),
    init(N-1,M).

test_update(N,M) ->
    test_update(N,M,0).

test_update(0,_,T) -> T;
test_update(N,M,T) -> test_update(N-1,M,T+update(random:uniform(M),random:uniform(10*M))).

update(K,V) ->
    {R,_} = timer:tc(score,update,[K,V]),
    R.

Upvotes: 6

Berzemus
Berzemus

Reputation: 3658

I wouldn't be picky about the way erlang chooses to order it's data: it optimizes itself or fast lookups. What you could do however, is read your ETS table in a list, and use a lists:sort/2 to sort the data.

List  = ets:tab2list(EtsTable),
lists:sort(SortFun,List).

It'll still be fast: ETS tables and lists reside in memory, as long as you have enough of it. However, I'd dump the ordered_set, you'll lose the constant access time

From the ETS manual:

This module is an interface to the Erlang built-in term storage BIFs. These provide the ability to store very large quantities of data in an Erlang runtime system, and to have constant access time to the data. (In the case of ordered_set, see below, access time is proportional to the logarithm of the number of objects stored).

And don't forget some form of disk-based backup, dets or mnesia (if the data is worth keeping).

Upvotes: 3

Related Questions