Aceman
Aceman

Reputation: 31

How to create a histogram with varying bin widths

I have been unsuccessful with other using hist plot. A simple problem would be using the following data:

age range - frequency - central band width -  bin width  - height (respectively)
1-4    - 30      -      2.5     -         3    -   10
5-6   -  20     -       5.5        -      1   -    20
7-17  -  30       -     12      -         10   -   3

With age along the X axis, with a linear scale, so the bin width for 1-4 would be 3, with height 10, bin width for 5-6 would be 1 with height of 20, and 7-17 would be 10 and the height would be 3.

How would can I place these data into a Word/notepad document .dat file? And how can I then use them to set up a histogram in gnuplot?

Upvotes: 3

Views: 3483

Answers (2)

Kip Ingram
Kip Ingram

Reputation: 1

I managed a fairly nice example of variable width boxes last night. I was plotting latency histogram data produced by the FIO storage performance test package. With my compile options I have 1856 bins, that go as follows:

  • 1 ns wide from 0-128 ns (128 bins)
  • 2 ns wide from 128-256 ns (64 bins)
  • 4 ns wide from 256-512 ns (64 bins)
  • 8 ns wide from 512-1024 ns (64 bins)

  • etc...

My latency values at plot time are in microseconds (FIO provides nanoseconds, but I wanted microseconds for historical reasons). I did not have the opportunity to include the bin widths in my data. So I did this:

  • f(x) = (2**(int(log(x*1000)/log(2))-6))/1100
  • plot "temp" u 1:2:(f(column(1))) with boxes fs transparent solid 0.7 noborder title "$legend"$base_plot

The f(x) definition returns the box width for a given latency - it works as follows:

  1. First, x*1000 gets me back to nanoseconds.
  2. log(x*1000)/log(2) takes the base 2 logarithm of the nanosecond count.
  3. The int() just gives me the integer part of that. Note that now for, say, 128 ns, I'd have 7.
  4. The -6 gets me to the base 2 log of the bin width.
  5. The 2 ** gets me to the bin width.
  6. The /1000 returns me from nanoseconds to microseconds.

Then I just use f(latency) in the plot command as the box width.

This works - it seems to work perfectly as far as I can tell. It would not give the right result for x < 64 ns, but I don't have any data that small, so it works out. A conditional expression could be used to patch it up for that part of the range.

I think the key observations here are that a) you don't have to have the width as literal data - if you can calculate it from the data you do have, you're golden, and b) column(n) is an alternative to $n as a way of expressing column values in the plot command. In my case I have all this in a bash script, and bash intercepted the $1.

Upvotes: 0

Christoph
Christoph

Reputation: 48390

I would use the following data file format (use only white spaces to delimit fields):

"age range" "frequency" "central band width" "bin width" "height"
1-4         30          2.5                  3           10
5-6         20          5.5                  1           20
7-17        30          12                   10          3
  • To plot with variable boxwidth, use the boxes plotting style. That allows you to use the value from a column as width.
  • With xtic(1) you use the entry in the first column as xticlabel.

So a rather simple plotting script looks as follows:

set style fill solid noborder
set yrange [0:*]
set offset 1,1,1,0
plot 'file.txt' using 3:5:4:xtic(1) with boxes notitle

The result with version 4.6.3 and the pngcairo terminal is:

enter image description here

Upvotes: 6

Related Questions