slashdottir
slashdottir

Reputation: 8536

Stacked histogram with time series data with gnuplot?

I have a lot of data like this

 callr |    method  | call_count |    day     
 ------+-------------------------+------------
 foo   | find_paths |      10    | 2016-10-10
 bar   | find_paths |      100   | 2016-10-10
 foo   | find_all   |      123   | 2016-10-10
 foo   | list_paths |     2243   | 2016-10-10
 foo   | find_paths |      234   | 2016-10-11
 foo   | collect    |      200   | 2016-10-11
 bar   | collect    |       1    | 2016-10-11
 baz   | collect    |        3   | 2016-10-11
 ...      ...             ...        ...

And I want to create a stacked histogram for each method showing continuous days along the bottom and stacked bars for each day with callers and number of calls.

If I transform the data, e.g.

select method, sum(call_count), day from foo where method='collect' group by method, day order by method, day;

I'm able to get a bar chart with all the calls for one method in one color, with a plg file like this, e.g.:

set terminal png
set title "Method: " . first_arg
set output "" . first_arg . ".png"
set datafile separator '|'
set style data boxes
set style fill solid
set boxwidth 0.5
set xdata time
set timefmt "%Y-%m-%d"
set format x "%a %m-%d"
xstart="2016-10-01"
xend="2017-01-01"
set xrange [xstart:xend]
set xlabel "Date" tc ls 8  offset -35, -3
set ylabel "Calls"  tc ls 8

plot '<cat' using 3:4

called like this:

cat file | gnuplot -p -e "plot '<cat';first_arg='collect'" calls.plg

histogram of all calls

However, what I really want is a way to show the breakdown by caller in the same sort of graph. I can't get the stacked histogram using gnuplot yet. Everything I've tried complains about the using statement, e.g. 'Need full using spec for x time data' or the like.

Want something like this, but with the days continuous along the bottom. E.g. if no calls were made that day - then no histogram bar

enter image description here

Thank you for any ideas

Upvotes: 2

Views: 1544

Answers (2)

Rob
Rob

Reputation: 183

Refer to https://psy.swansea.ac.uk/staff/Carter/gnuplot/gnuplot_time_histograms.htm for a practical solution, especially its final section "Boxes Plot". The solution consists of using the add functionality ($2+$3...) while explicitly "with boxes" is used.

Upvotes: 0

torbiak
torbiak

Reputation: 146

Combine data for each day using smooth freq and a bin() function that rounds epoch times to days. Plot sums of the y-axis categories as boxes in descending order of height using inline for and a sum expression so the differences between sums equal the values of the categories. So, the tallest box will have height foo+bar+baz (caller=3), the next tallest foo+bar (caller=2), and the shortest is just foo (caller=1).

calls:

caller  method      call_count  day
foo     find_paths  10          2016-10-10
bar     find_paths  100         2016-10-10
foo     find_all    123         2016-10-10
foo     list_paths  2243        2016-10-10
foo     find_paths  234         2016-10-11
foo     collect     200         2016-10-11
bar     collect     1           2016-10-11
baz     collect     3           2016-10-11

gnuplot script:

binwidth = 86400
bin(t) = (t - (int(t) % binwidth))
date_fmt = "%Y-%m-%d"
time = '(bin(timecolumn(4, date_fmt)))'

# Set absolute boxwidth so all boxes get plotted fully. Otherwise boxes at the
# edges of the range can get partially cut off, which I think looks weird.
set boxwidth 3*binwidth/4 absolute

set key rmargin
set xdata time
set xtics binwidth format date_fmt time rotate by -45 out nomirror
set style fill solid border lc rgb "black"

callers = system("awk 'NR != 1 {print $1}' calls \
    | sort | uniq -c | sort -nr | awk '{print $2}'")
# Or, if Unix tools aren't available:
# callers = "foo bar baz"

plot for [caller=words(callers):1:-1] 'calls' \
    u @time:(sum [i=1:caller] \
        strcol("caller") eq word(callers, i) ? column("call_count") : 0) \
    smooth freq w boxes t word(callers, caller)

Calls per day, by caller

I wrote a longer discussion about gnuplot time-series histograms here: Time-series histograms: gnuplot vs matplotlib

Upvotes: 2

Related Questions