Andreas Wederbrand
Andreas Wederbrand

Reputation: 39951

Plotting multiple timebased on/off

I have a large series of database connections that are in use for a while and then resting before being called again.

I wish to plot their usage over time, so I'm imaging one line per connection that turns on and off as time passes.

The data can be formatted basically however is needed, but as an example it could look like this

2018-03-01 20:31:00,000Z foo start
2018-03-01 20:31:00,100Z bar start
2018-03-01 20:31:00,300Z bar stop
2018-03-01 20:31:00,400Z foo stop
2018-03-01 20:31:00,600Z bar start
2018-03-01 20:31:00,900Z bar stop

And the plot would look like this

foo ****
bar  **   ***
    0123456789

where the numbers at the bootom denotes 100 milleseconds

Upvotes: 1

Views: 50

Answers (1)

ewcz
ewcz

Reputation: 13087

I think that this is beyond what "pure" Gnuplot can offer. However, one might preprocess the data file in order to make it more digestible for Gnuplot. For instance, the sample script below passes through the data, expresses the date/time in units of 100 milliseconds, and for each event (foo/bar) marks for the corresponding time t how many instances of said event are active at time t. After loading the entire file, it processes this data and prints for each event all intervals on which an event has at least 1 instance running. This has the advantage that it supports also overlapping events of the same type (i.e., simultaneous connections).

#!/usr/bin/env python
import datetime
import sys

t_min = sys.maxsize
t_max = -t_min
events = {}
with open(sys.argv[1], 'r') as F:
    for line in F:
        date, time, event, etype = map(lambda s: s.strip(), line.strip().split())
        if not etype in ['start', 'stop']: continue

        t = datetime.datetime.strptime('{date:s} {time:s}'.format(date = date, time = time), '%Y-%m-%d %H:%M:%S,%fZ').timestamp()
        t = int(t*10)
        t_min = min(t_min, t)
        t_max = max(t_max, t)

        if not event in events: events[event] = {}
        if not t in events[event]: events[event][t] = 0
        events[event][t] += (1 if etype == 'start' else -1)

unique_events = sorted(events.keys())
for eid, event in enumerate(unique_events):
    print('#%d\t%s' % (eid, event))

    ts = sorted(events[event].keys())

    multiplicity, t_prev = 0, 0
    for t_curr in ts:
        f = events[event][t_curr]
        t_curr -= t_min

        if multiplicity > 0:
            print('{t_prev:d}\t{eid:d}\t{multiplicity:d}\n{t_curr:d}\t{eid:d}\t{multiplicity:d}\n'.format(t_prev = t_prev, t_curr = t_curr, eid = eid, multiplicity = multiplicity))

        multiplicity = max(multiplicity + f, 0)
        t_prev = t_curr

For sample data of:

2018-03-01 20:31:00,000Z foo start
2018-03-01 20:31:00,000Z foo start
2018-03-01 20:31:00,100Z bar start
2018-03-01 20:31:00,300Z bar stop
2018-03-01 20:31:00,400Z foo stop
2018-03-01 20:31:00,600Z bar start
2018-03-01 20:31:00,900Z bar stop
2018-03-01 20:31:00,900Z foo stop

this would produce:

#0  bar
1   0   1
3   0   1

6   0   1
9   0   1

#1  foo
0   1   2
4   1   2

4   1   1
9   1   1

which means for example that event 1 (foo) had 2 instances running on interval [0, 4], while there was only 1 instance on [4, 9]. This output is then directly processable by Gnuplot.

Upvotes: 1

Related Questions