Duncan Paul
Duncan Paul

Reputation: 595

Stop and start Erlang tracer without losing trace events

I have a question regarding tracers in Erlang, and how these can be switched on and off without losing any trace events. Suppose I have a process P1 which is being traced using the send and receive trace flags, like so:

erlang:trace(P1Pid, true, [set_on_spawn, send, 'receive', {tracer, T1Pid}])

Since the set_on_spawn flag is specified, once a (sub-)process P2 is spawned by P1, the same flags (i.e. set_on_spawn, send, 'receive') will apply to P2 as well. Now suppose I would like to create a new tracer on just P2, such that a tracer T1 handles traces from P1, and tracer T2 handles traces from P2. In order to do so, (since Erlang allows only one tracer per process), I would need to first unset the trace flags (i.e. set_on_spawn, send, 'receive') from P2 (since these are automatically inherited due to the set_on_spawn flag) and set them again on P2, as follows:

    % Unset trace flags on P2. 
    erlang:trace(P2Pid, false, [set_on_spawn, send, 'receive']),

    % We might lose trace events at this instant which were raised
    % by process P2 while un-setting the tracer on P2 and setting
    % it again.

    % Now set again trace flags on P2, directing the trace to 
    % a new tracer T2.
    erlang:trace(P2Pid, true, [set_on_spawn, send, 'receive', {tracer, T2Pid}]),

In the lines between setting and un-setting the tracer, a number of trace events which are raised by process P2 might be lost due to a race condition here.

My question is this: can this be achieved without losing trace events?

Does Erlang provide the means by which this 'tracer handover' (i.e. from T1 to T2) can be done in an atomic fashion?

Alternatively, is it possible to pause the Erlang VM and in doing so, pause tracing, thereby avoid losing trace events?

Upvotes: 0

Views: 231

Answers (1)

Duncan Paul
Duncan Paul

Reputation: 595

I have looked deeper into the problem and might have found a semi-desirable (see points below) partial work around. After reading the Erlang documentation, I came across the erlang:suspend_process/1 and erlang:resume_process/1 BIFs. Using these two, I can achieve the desired behaviour like so:

% Suspend process P2. According to the Erlang docs, this function
% blocks the caller (i.e. the current tracer) until P2 is suspended.
% This way, we do not lose trace events.
erlang:suspend_process(P2Pid),

% Unset trace flags on P2. 
erlang:trace(P2Pid, false, [set_on_spawn, send, 'receive']),

% We should not lose any trace events from P2, since it is
% currently suspended, and therefore cannot generate any.
% However, we can still lose receive trace events that are 
% generated as a result of other processes sending messages 
% to P2.

% Now set again trace flags on P2, directing the trace to 
% a new tracer T2.
erlang:trace(P2Pid, true, [set_on_spawn, send, 'receive', {tracer, T2Pid}]),

% Finally, resume process P2, so that we can receive any trace 
% messages generated by P2 on the new tracer T2.
erlang:resume_process(P2Pid).

My only three concerns using this method are the following:

  1. The Erlang documentation for erlang:suspend_process/1 and erlang:resume_process/1 explicitly states that these are to be used for debugging purposes only. My question is why cannot these be used in production when, as illustrated in the example, unless the process P2 is suspended, we face the risk of losing trace events (while switching from tracer T1 to tracer T2)?
  2. We are actually messing around with the system (i.e. we're interfering with its scheduling). Is there a risk associated with this (apart from the fact that one can forget to call erlang:resume_process/1 on a previously suspended process)?
  3. More importantly, even though we can prevent process P2 from taking any action, we cannot prevent other processes from sending messages to P2. These messages will result in {trace, Pid, receive, ...} trace events which might be lost while we are switching traces. Is there a way in which this can be avoided?

NB: A process P that was previously suspended by process P' is automatically resumed if P' (the one that invoked erlang:suspend_process/1) dies.

Upvotes: 0

Related Questions