Reputation: 39951
I have a large series of database connections that are in use for a while and then resting before being called again.
I wish to plot their usage over time, so I'm imaging one line per connection that turns on and off as time passes.
The data can be formatted basically however is needed, but as an example it could look like this
2018-03-01 20:31:00,000Z foo start
2018-03-01 20:31:00,100Z bar start
2018-03-01 20:31:00,300Z bar stop
2018-03-01 20:31:00,400Z foo stop
2018-03-01 20:31:00,600Z bar start
2018-03-01 20:31:00,900Z bar stop
And the plot would look like this
foo ****
bar ** ***
0123456789
where the numbers at the bootom denotes 100 milleseconds
Upvotes: 1
Views: 50
Reputation: 13087
I think that this is beyond what "pure" Gnuplot can offer. However, one might preprocess the data file in order to make it more digestible for Gnuplot. For instance, the sample script below passes through the data, expresses the date/time in units of 100 milliseconds, and for each event (foo
/bar
) marks for the corresponding time t
how many instances of said event are active at time t
. After loading the entire file, it processes this data and prints for each event all intervals on which an event has at least 1 instance running. This has the advantage that it supports also overlapping events of the same type (i.e., simultaneous connections).
#!/usr/bin/env python
import datetime
import sys
t_min = sys.maxsize
t_max = -t_min
events = {}
with open(sys.argv[1], 'r') as F:
for line in F:
date, time, event, etype = map(lambda s: s.strip(), line.strip().split())
if not etype in ['start', 'stop']: continue
t = datetime.datetime.strptime('{date:s} {time:s}'.format(date = date, time = time), '%Y-%m-%d %H:%M:%S,%fZ').timestamp()
t = int(t*10)
t_min = min(t_min, t)
t_max = max(t_max, t)
if not event in events: events[event] = {}
if not t in events[event]: events[event][t] = 0
events[event][t] += (1 if etype == 'start' else -1)
unique_events = sorted(events.keys())
for eid, event in enumerate(unique_events):
print('#%d\t%s' % (eid, event))
ts = sorted(events[event].keys())
multiplicity, t_prev = 0, 0
for t_curr in ts:
f = events[event][t_curr]
t_curr -= t_min
if multiplicity > 0:
print('{t_prev:d}\t{eid:d}\t{multiplicity:d}\n{t_curr:d}\t{eid:d}\t{multiplicity:d}\n'.format(t_prev = t_prev, t_curr = t_curr, eid = eid, multiplicity = multiplicity))
multiplicity = max(multiplicity + f, 0)
t_prev = t_curr
For sample data of:
2018-03-01 20:31:00,000Z foo start
2018-03-01 20:31:00,000Z foo start
2018-03-01 20:31:00,100Z bar start
2018-03-01 20:31:00,300Z bar stop
2018-03-01 20:31:00,400Z foo stop
2018-03-01 20:31:00,600Z bar start
2018-03-01 20:31:00,900Z bar stop
2018-03-01 20:31:00,900Z foo stop
this would produce:
#0 bar
1 0 1
3 0 1
6 0 1
9 0 1
#1 foo
0 1 2
4 1 2
4 1 1
9 1 1
which means for example that event 1
(foo
) had 2 instances running on interval [0, 4]
, while there was only 1 instance on [4, 9]
. This output is then directly processable by Gnuplot.
Upvotes: 1