Reputation: 141
I have designed a mnesia database with 5 different tables. The idea is to simulate queries from many nodes (computers) not just one, at the moment from the terminal i can execute a query, but I just need help on how i can make it such that I am requesting information from many computers. I am testing for scalability and want to investigate the performance of mnesia vs other databases. Any idea will be highly appreciated.
Upvotes: 2
Views: 3319
Reputation: 7836
The best way to test mnesia is by using an intensive threaded job both on the local Erlang Node where mnesia is running and on the remote nodes. Usually, you want to have remote nodes using RPC calls
in which reads and writes are being executed on mnesia tables. Of-course, with high concurrency comes a trade off; speed of transactions will reduce, many may be retried as the locks may be many at a given time; But mnesia will ensure that all processes receive an {atomic,ok}
for each transactional call they make.
The Concept
I propose that we have a non-blocking overload with both Writes and reads in directed to each mnesia table by as many processes as possible. We measure the time difference between the call to the write
function and the time it takes for our massive mnesia subscriber to get a Write
Event. These Events are sent by mnesia every after a successful Transaction and so we need not interrupt the working/overloading processes but rather let a "strong" mnesia subscriber to wait for asynchronous events reporting successful deletes and writes as soon as they occur.
The technique here is that we take the time stamp at the point just before calling a write function and then we note down the record key
, the write CALL timestamp
. Then our mnesia subscriber would note down the record key
, the write/read EVENT timestamp
. Then the time difference between these two time stamps (lets call it: CALL-to-EVENT Time
) would give us a rough idea of how loaded, or how efficient we are going. As locks increase with Concurrency, we should be registering increasing CALL-to-EVENT Time
parameter. Processes doing writes (unlimited) will do so concurrently while those doing reads will also continue to do so without interruptions. We will choose the number of processes for each operation but lets first lay ground for the entire test case.
All the above Concept is for Local operations (processes running on the same Node as Mnesia)
--> Simulating Many Nodes
Well, i have personally not simulated Nodes in Erlang, i have always worked with real Erlang Nodes on the Same box or on several different machines in a networked environment. However, i advise that you look closely on this module: http://www.erlang.org/doc/man/slave.html, concentrate more on this one here: http://www.erlang.org/doc/man/ct_slave.html, and look at the following links as they talk about creating, simulating and controlling many nodes under another parent node (http://www.erlang.org/doc/man/pool.html, Erlang: starting slave node,https://support.process-one.net/doc/display/ERL/Starting+a+set+of+Erlang+cluster+nodes,http://www.berabera.info/oldblog/lenglet/howtos/erlangkerberosremctl/index.html). I will not dive into a jungle of Erlang Nodes here bacause it also another complicated topic but i will concentrate on tests on the same node running mnesia. I have come up with the above mnesia test concept and here, lets start implementing it.
Now, First of all, you need to make a test plan for each table (separate). This should include both writes and reads. Then you need to decide whether you want to do dirty operations or transactional operations on the tables. You need to test speed of traversing a mnesia table in relation to its size. Lets take an example of a simple mnesia table
-record(key_value,{key,value,instanceId,pid}).
We would want to have a general function for writing into our table, here below:
write(Record)-> %% Use mnesia:activity/4 to test several activity %% contexts (and if your table is fragmented) %% like the commented code below %% %% mnesia:activity( %% transaction, %% sync_transaction | async_dirty | ets | sync_dirty %% fun(Y) -> mnesia:write(Y) end, %% [Record], %% mnesia_frag %% ) mnesia:transaction(fun() -> ok = mnesia:write(Record) end).
And for our reads, we will have:
read(Key)-> %% Use mnesia:activity/4 to test several activity %% contexts (and if your table is fragmented) %% like the commented code below %% %% mnesia:activity( %% transaction, %% sync_transaction | async_dirty| ets | sync_dirty %% fun(Y) -> mnesia:read({key_value,Y}) end, %% [Key], %% mnesia_frag %% ) mnesia:transaction(fun() -> mnesia:read({key_value,Key}) end).Now, we want to write very many records into our small table. We need a key generator. This key generator will be our own pseudo-random string generator. However, we need our generator to tell us the instant it generates a key so we record it. We want to see how long it takes to write a generated key. Lets put it down like this:
timestamp()-> erlang:now().To make very many concurrent writes, we need a function which will be executed by many processes we will spawn. In this function, its desirable NOT to put any blocking functions such as
str(XX)-> integer_to_list(XX).
generate_instance_id()-> random:seed(now()), guid() ++ str(crypto:rand_uniform(1, 65536 * 65536)) ++ str(erlang:phash2({self(),make_ref(),time()})).
guid()-> random:seed(now()), MD5 = erlang:md5(term_to_binary({self(),time(),node(), now(), make_ref()})), MD5List = binary_to_list(MD5), F = fun(N) -> f("~2.16.0B", [N]) end, L = lists:flatten([F(N) || N <- MD5List]), %% tell our massive mnesia subscriber about this generation InstanceId = generate_instance_id(), mnesia_subscriber ! {self(),{key,write,L,timestamp(),InstanceId}}, {L,InstanceId}.
sleep/1
usually implemented as sleep(T)-> receive after T -> true end.
. Such a function would make a processes execution to hang for the specified milliseconds. mnesia_tm
does the lock control, retry, blocking, e.t.c. on behalf of the processes to avoid dead locks. Lets say, we want each processes to write an unlimited amount of records
. Here is our function:
-define(NO_OF_PROCESSES,20). start_write_jobs()-> [spawn(?MODULE,generate_and_write,[]) || _ <- lists:seq(1,?NO_OF_PROCESSES)], ok. generate_and_write()-> %% remember that in the function ?MODULE:guid/0, %% we inform our mnesia_subscriber about our generated key %% together with the timestamp of the generation just before %% a write is made. %% The subscriber will note this down in an ETS Table and then %% wait for mnesia Event about the write operation. Then it will %% take the event time stamp and calculate the time difference %% From there we can make judgement on performance. %% In this case, we make the processes make unlimited writes %% into our mnesia tables. Our subscriber will trap the events as soon as %% a successful write is made in mnesia %% For all keys we just write a Zero as its value
{Key,Instance} = guid(), write(#key_value{key = Key,value = 0,instanceId = Instance,pid = self()}), generate_and_write().
Likewise, lets see how the read jobs will be done. We will have a Key provider, this Key provider keeps rotating around the mnesia table picking only keys, up and down the table it will keep rotating. Here is its code:
first()-> mnesia:dirty_first(key_value). next(FromKey)-> mnesia:dirty_next(key_value,FromKey). start_key_picker()-> register(key_picker,spawn(fun() -> key_picker() end)). key_picker()-> try ?MODULE:first() of '$end_of_table' -> io:format("\n\tTable is empty, my dear !~n",[]), %% lets throw something there to start with ?MODULE:write(#key_value{key = guid(),value = 0}), key_picker(); Key -> wait_key_reqs(Key) catch EXIT:REASON -> error_logger:error_info(["Key Picker dies",{EXIT,REASON}]), exit({EXIT,REASON}) end. wait_key_reqs('$end_of_table')-> receive {From,<<"get_key">>} -> Key = ?MODULE:first(), From ! {self(),Key}, wait_key_reqs(?MODULE:next(Key)); {_,<<"stop">>} -> exit(normal) end; wait_key_reqs(Key)-> receive {From,<<"get_key">>} -> From ! {self(),Key}, NextKey = ?MODULE:next(Key), wait_key_reqs(NextKey); {_,<<"stop">>} -> exit(normal) end. key_picker_rpc(Command)-> try erlang:send(key_picker,{self(),Command}) of _ -> receive {_,Reply} -> Reply after timer:seconds(60) -> %% key_picker hang, or too busy erlang:throw({key_picker,hanged}) end catch _:_ -> %% key_picker dead start_key_picker(), sleep(timer:seconds(5)), key_picker_rpc(Command) end. %% Now, this is where the reader processes will be %% accessing keys. It will appear to them as though %% its random, because its one process doing the %% traversal. It will all be a game of chance %% depending on the scheduler's choice %% he who will have the next read chance, will %% win ! okay, lets get going below :) get_key()-> Key = key_picker_rpc(<<"get_key">>), %% lets report to our "massive" mnesia subscriber %% about a read which is about to happen %% together with a time stamp. Instance = generate_instance_id(), mnesia_subscriber ! {self(),{key,read,Key,timestamp(),Instance}}, {Key,Instance}.
Wow !!! Now we need to create the function where we will start all the readers.
-define(NO_OF_READERS,10). start_read_jobs()-> [spawn(?MODULE,constant_reader,[]) || _ <- lists:seq(1,?NO_OF_READERS)], ok. constant_reader()-> {Key,InstanceId} = ?MODULE:get_key(), Record = ?MODULE:read(Key), %% Tell mnesia_subscriber that a read has been done so it creates timestamp mnesia:report_event({read_success,Record,self(),InstanceId}), constant_reader().
Now, the biggest part; mnesia_subscriber !!! This is a simple process that will subscribe to simple events. Get mnesia events documentation from the mnesia users guide. Here is the mnesia subscriber
-record(read_instance,{ instance_id, before_read_time, after_read_time, read_time %% after_read_time - before_read_time }). -record(write_instance,{ instance_id, before_write_time, after_write_time, write_time %% after_write_time - before_write_time }). -record(benchmark,{ id, %% {pid(),Key} read_instances = [], write_instances = [] }). subscriber()-> mnesia:subscribe({table,key_value, simple}), %% lets also subscribe for system %% events because events passing through %% mnesia:event/1 will go via %% system events. mnesia:subscribe(system), wait_events(). -include_lib("stdlib/include/qlc.hrl"). wait_events()-> receive {From,{key,write,Key,TimeStamp,InstanceId}} -> %% A process is just about to call %% mnesia:write/1 and so we note this down Fun = fun() -> case qlc:e(qlc:q([X || X <- mnesia:table(benchmark),X#benchmark.id == {From,Key}])) of [] -> ok = mnesia:write(#benchmark{ id = {From,Key}, write_instances = [ #write_instance{ instance_id = InstanceId, before_write_time = TimeStamp }] }), ok; [Here] -> WIs = Here#benchmark.write_instances, NewInstance = #write_instance{ instance_id = InstanceId, before_write_time = TimeStamp }, ok = mnesia:write(Here#benchmark{write_instances = [NewInstance|WIs]}), ok end end, mnesia:transaction(Fun), wait_events(); {mnesia_table_event,{write,#key_value{key = Key,instanceId = I,pid = From},_ActivityId}} -> %% A process has successfully made a write. So we look it up and %% get timeStamp difference, and finish bench marking that write WriteTimeStamp = timestamp(), F = fun()-> [Here] = mnesia:read({benchmark,{From,Key}}), WIs = Here#benchmark.write_instances, {_,WriteInstance} = lists:keysearch(I,2,WIs), BeforeTmStmp = WriteInstance#write_instance.before_write_time, NewWI = WriteInstance#write_instance{ after_write_time = WriteTimeStamp, write_time = time_diff(WriteTimeStamp,BeforeTmStmp) }, ok = mnesia:write(Here#benchmark{write_instances = [NewWI|lists:keydelete(I,2,WIs)]}), ok end, mnesia:transaction(F), wait_events(); {From,{key,read,Key,TimeStamp,InstanceId}} -> %% A process is just about to do a read %% using mnesia:read/1 and so we note this down Fun = fun()-> case qlc:e(qlc:q([X || X <- mnesia:table(benchmark),X#benchmark.id == {From,Key}])) of [] -> ok = mnesia:write(#benchmark{ id = {From,Key}, read_instances = [ #read_instance{ instance_id = InstanceId, before_read_time = TimeStamp }] }), ok; [Here] -> RIs = Here#benchmark.read_instances, NewInstance = #read_instance{ instance_id = InstanceId, before_read_time = TimeStamp }, ok = mnesia:write(Here#benchmark{read_instances = [NewInstance|RIs]}), ok end end, mnesia:transaction(Fun), wait_events(); {mnesia_system_event,{mnesia_user,{read_success,#key_value{key = Key},From,I}}} -> %% A process has successfully made a read. So we look it up and %% get timeStamp difference, and finish bench marking that read ReadTimeStamp = timestamp(), F = fun()-> [Here] = mnesia:read({benchmark,{From,Key}}), RIs = Here#benchmark.read_instances, {_,ReadInstance} = lists:keysearch(I,2,RIs), BeforeTmStmp = ReadInstance#read_instance.before_read_time, NewRI = ReadInstance#read_instance{ after_read_time = ReadTimeStamp, read_time = time_diff(ReadTimeStamp,BeforeTmStmp) }, ok = mnesia:write(Here#benchmark{read_instances = [NewRI|lists:keydelete(I,2,RIs)]}), ok end, mnesia:transaction(F), wait_events(); _ -> wait_events(); end. time_diff({A2,B2,C2} = _After,{A1,B1,C1} = _Before)-> {A2 - A1,B2 - B1,C2 - C1}.
Alright ! That was huge :) So we are done with the subscriber. We need to put the code that will crown it all together and run the necessary tests.
install()-> mnesia:stop(). mnesia:delete_schema([node()]), mnesia:create_schema([node()]), mnesia:start(), {atomic,ok} = mnesia:create_table(key_value,[ {attributes,record_info(fields,key_value)}, {disc_copies,[node()]}
]), {atomic,ok} = mnesia:create_table(benchmark,[ {attributes,record_info(fields,benchmark)}, {disc_copies,[node()]} ]), mnesia:stop(), ok.
start()-> mnesia:start(), ok = mnesia:wait_for_tables([key_value,benchmark],timer:seconds(120)), %% boot up our subscriber register(mnesia_subscriber,spawn(?MODULE,subscriber,[])), start_write_jobs(), start_key_picker(), start_read_jobs(), ok.
Now, with proper analysis of the benchmark table records, you will get record of average read times,
average write times e.t.c. You draw a graph of these times against increasing number of processes.
As we increase the number of processes, you will discover that the read and write times increase
. Get the code, read it and make use of it. You may not use all of it but am sure you could pick up
new concepts from there as others send in there solutions. Using mnesia events is the best way to test mnesia reads and writes without blocking the processes doing the actual writing or reading. In the example above, the reading and writing processes are out of any control, infact, they will run forever until you terminate the VM. You can traverse the benchmark table with a good formulae to make use of the read and write times per read or write instance and then you would calculate averages, variations e.t.c.
As a consequence, the concepts behind mnesia can only be compared with Ericsson's NDB Database found here: http://ww.dolphinics.no/papers/abstract/ericsson.html, but not with existing RDBMS, or Document Oriented Databases, e.t.c. Those are my thoughts :) lets wait for what others have to say.....
Upvotes: 8
Reputation: 64
You start additional nodes using command like this:
erl -name [email protected] -cookie devel \
-mnesia extra_db_nodes "['[email protected]']"\
-s mnesia start
where '[email protected]' is the node where mnesia is already setup. In this case all tables will be accessed from remote node, but you can make local copies with mnesia:add_table_copy/3
.
Then you can use spawn/2
or spawn/4
to start load generation on all nodes with something like:
lists:foreach(fun(N) ->
spawn(N, fun () ->
%% generate some load
ok
end
end,
[ '[email protected]', '[email protected]' ]
)
Upvotes: 0