Tiago Freitas Pereira
Tiago Freitas Pereira

Reputation: 690

Tensorboard histograms to matplotlib

I would like to "dump" the tensorboard histograms and plot them via matplotlib. I would have more scientific paper appealing plots.

I managed to hack the way through the Summary file using the tf.train.summary_iterator and dump the histogram that I wanted to dump( tensorflow.core.framework.summary_pb2.HistogramProto object). By doing that and implementing what the java-script code does with the data (https://github.com/tensorflow/tensorboard/blob/c2fe054231fe77f3a5b05dbc519f713d2e738d1c/tensorboard/plugins/histogram/tf_histogram_dashboard/histogramCore.ts#L104), I managed to get something similar (same trends) with the tensorboard plots, but not the exact same plot.

Can I have some light on this?

Thanks

Upvotes: 6

Views: 3863

Answers (4)

J3soon
J3soon

Reputation: 3153

The simplest way is to parse the events with tbparse and plot the histograms with seaborn kde_ridgeplot.

This tutorial generates the stacked distribution plot with around 30 lines of Python code:

  • Tensorboard preview:

    tensorboard

  • Parse by tbparse & plotted by seaborn:

    tbparse

You can open an issue if you encountered any question during parsing. (I'm the author of tbparse)

Upvotes: 1

Mike W
Mike W

Reputation: 1403

The best solution is loading all events and reconstructing all the histogram (as the answer of @khuesmann) but not using EventAccumulator but EventFileLoader. This will give you a histogram per wall time and step as the ones Tensorboard plots. It can be extended to return a list of actions by timestep and wall time.

Don't forget to check which tag will you use.

from tensorboard.backend.event_processing.event_file_loader import EventFileLoader
# Just in case, PATH_OF_FILE is the path of the file, not the folder
loader = EventFileLoader(PATH_Of_FILE)

# Where to store values
wtimes,steps,actions = [],[],[]
for event in loader.Load():
    wtime   = event.wall_time
    step    = event.step
    if len(event.summary.value) > 0:
        summary = event.summary.value[0]
        if summary.tag == HISTOGRAM_TAG:
            wtimes += [wtime]*int(summary.histo.num)
            steps  += [step] *int(summary.histo.num)

            for num,val in zip(summary.histo.bucket,summary.histo.bucket_limit):
                actions += [val] *int(num)

bear in mind that tensorflow approximates the actions and treats the actions as continuous variables, so even if you have discrete actions (e.g. 0,1,3) you will end up actions as 0.2,0.4,0.9,1.4 ... in that case round the values will do it.

Upvotes: 5

Mike W
Mike W

Reputation: 1403

A good solution is the one from @khuesmann, but this only allows you to retrieve the accumulated histogram, not the histogram per step -- which is the one actually being showed in tensorboard.

If you want the distribution and so far, what I have understood is that Tensorboard usually compresses the histogram to decrease the memory used to store the data -- imagine storing a 2D histogram over 4 million steps, the memory can increase fast quickly. These compress histograms are accessible by doing this:

from tensorboard.backend.event_processing.event_accumulator import EventAccumulator

n2n = EventAccumulator(PATH)
n2n.Reload()

# Check the tags under histograms and choose the one you want
n2n.Tags()

# This will give you the list used by tensorboard 
# of the compress histograms by timestep and wall time
n2n.CompressedHistograms(HISTOGRAM_TAG)

The only problem is that it compresses the histogram to five percentiles (in Basic points they are 0, 668, 1587, 3085, 5000, 6915, 8413, 9332, 10000) which corresponds to (-Inf, -1.5, -1, -0.5, 0, 0.5, 1, 1.5, Inf) in standard deviations. Check the code here.

I haven't read much, but it wouldn't be hard to reconstruct the temporal histograms that tensorboard shows. If I find a way to do it, I will post it here.

Upvotes: 1

khuesmann
khuesmann

Reputation: 198

In order to plot a tensorboard histogram with matplotlib I am doing the following:

event_acc = EventAccumulator(path, size_guidance={
    'histograms': STEP_COUNT,
})
event_acc.Reload()
tags = event_acc.Tags()
result = {}
for hist in tags['histograms']:
    histograms = event_acc.Histograms(hist)
    result[hist] = np.array([np.repeat(np.array(h.histogram_value.bucket_limit), np.array(h.histogram_value.bucket).astype(np.int)) for h in histograms])
return result

h.histogram_value.bucket_limit gives me the value and h.histogram_value.bucket the count of this value. So when i repeat the values accordingly (np.repeat(...)), I get a huge array of expected size. This array can now be plotted with the default matplotlib logic.

Upvotes: 5

Related Questions