Brent Bradburn
Brent Bradburn

Reputation: 54929

How to histogram a numeric variable?

I want to produce a simple histogram of a numeric variable X.

I'm having trouble finding a clear example.

Since it's important that the histogram be meaningful more than beautiful, I would prefer to specify the bin-size rather than letting the tool decide. See: Data Scientists: STOP Randomly Binning Histograms

Upvotes: 4

Views: 6356

Answers (1)

Brent Bradburn
Brent Bradburn

Reputation: 54929

Histograms are a primary tool for understanding the distribution of data. As such, Splunk automatically creates a histogram by default for raw event queries. So it stands to reason that Splunk should provide tools for you to create histograms of your own variables extracted from query results.

It may be that the reason this is hard to find is that the basic answer is very simple:

(your query) |rename (your value) as X
|chart count by X span=1.0

Select "Visualization" and set chart type to "Column Chart" for a traditional vertical-bar histogram.

There is an example of this in the docs described as "Chart the number of transactions by duration".

The span value is used to control binning of the data. Adjust this value to optimize your visualization.

Warning: It is legal to omit span, but if you do so the X-axis will be compacted non-linearly to eliminate empty bins -- this could result in confusion if you aren't careful about observing the bin labels (assuming they're even drawn).


If you have a long-tail distribution, it may be useful to partition the results to focus on the range of interest. This can be done using where:

(your query) |rename (your value) as X
|where X>=0 and X<=100
|chart count by X span=1.0

Alternatively, use a clamping function to preserve the out-of-range counts:

(your query) |rename (your value) as X
|eval X=max(0,min(X,100))
|chart count by X span=1.0

Another way to deal with long-tails is to use a logarithmic span mode -- special values for span include log2 and log10 (documented as log-span).


If you would like to have both a non-default span and a compressed X-axis, there's probably a parameter for that -- but the documentation is cryptic.

I found that this 2-stage approach made that happen:

(your query) |rename (your value) as X
|bin X span=10.0 as X
|chart count by X

Again, this type of chart can be dangerously misleading if you don't pay careful attention to the labels.

Upvotes: 8

Related Questions