Reputation: 5073
How can you write a python script to read Tensorboard log files, extracting the loss and accuracy and other numerical data, without launching the GUI tensorboard --logdir=...
?
Upvotes: 45
Views: 30066
Reputation: 1
Try this:
python << EOF | bat --color=always -pltsv
from contextlib import suppress
import sys
from time import gmtime, strftime
import pandas as pd
from tensorboard.backend.event_processing.event_file_loader import (
EventFileLoader,
)
df = pd.DataFrame(columns=["Step", "Value"])
df.index.name = "YYYY-mm-dd HH:MM"
for event in EventFileLoader("$1").Load():
with suppress(IndexError):
df.loc[strftime("%F %H:%M", gmtime(event.wall_time))] = [ # type: ignore
event.step, # type: ignore
event.summary.value[0].tensor.float_val[0], # type: ignore
]
df.index = pd.to_datetime(df.index) # type: ignore
df.Step = df.Step.astype(int)
df.to_csv(sys.stdout, "\t")
EOF
bat is optimal.
Upvotes: 0
Reputation: 450
For anyone interested, I've adapted user1501961's answer into a function for parsing tensorboard scalars into a dictionary of pandas dataframes:
from tensorboard.backend.event_processing import event_accumulator
import pandas as pd
def parse_tensorboard(path, scalars):
"""returns a dictionary of pandas dataframes for each requested scalar"""
ea = event_accumulator.EventAccumulator(
path,
size_guidance={event_accumulator.SCALARS: 0},
)
_absorb_print = ea.Reload()
# make sure the scalars are in the event accumulator tags
assert all(
s in ea.Tags()["scalars"] for s in scalars
), "some scalars were not found in the event accumulator"
return {k: pd.DataFrame(ea.Scalars(k)) for k in scalars}
Upvotes: 6
Reputation: 868
You can use TensorBoard's Python classes or script to extract the data:
How can I export data from TensorBoard?
If you'd like to export data to visualize elsewhere (e.g. iPython Notebook), that's possible too. You can directly depend on the underlying classes that TensorBoard uses for loading data:
python/summary/event_accumulator.py
(for loading data from a single run) orpython/summary/event_multiplexer.py
(for loading data from multiple runs, and keeping it organized). These classes load groups of event files, discard data that was "orphaned" by TensorFlow crashes, and organize the data by tag.As another option, there is a script (
tensorboard/scripts/serialize_tensorboard.py
) which will load a logdir just like TensorBoard does, but write all of the data out to disk as json instead of starting a server. This script is setup to make "fake TensorBoard backends" for testing, so it is a bit rough around the edges.
Using EventAccumulator
:
# In [1]: from tensorflow.python.summary import event_accumulator # deprecated
In [1]: from tensorboard.backend.event_processing import event_accumulator
In [2]: ea = event_accumulator.EventAccumulator('events.out.tfevents.x.ip-x-x-x-x',
...: size_guidance={ # see below regarding this argument
...: event_accumulator.COMPRESSED_HISTOGRAMS: 500,
...: event_accumulator.IMAGES: 4,
...: event_accumulator.AUDIO: 4,
...: event_accumulator.SCALARS: 0,
...: event_accumulator.HISTOGRAMS: 1,
...: })
In [3]: ea.Reload() # loads events from file
Out[3]: <tensorflow.python.summary.event_accumulator.EventAccumulator at 0x7fdbe5ff59e8>
In [4]: ea.Tags()
Out[4]:
{'audio': [],
'compressedHistograms': [],
'graph': True,
'histograms': [],
'images': [],
'run_metadata': [],
'scalars': ['Loss', 'Epsilon', 'Learning_rate']}
In [5]: ea.Scalars('Loss')
Out[5]:
[ScalarEvent(wall_time=1481232633.080754, step=1, value=1.6365480422973633),
ScalarEvent(wall_time=1481232633.2001867, step=2, value=1.2162202596664429),
ScalarEvent(wall_time=1481232633.3877788, step=3, value=1.4660096168518066),
ScalarEvent(wall_time=1481232633.5749283, step=4, value=1.2405034303665161),
ScalarEvent(wall_time=1481232633.7419815, step=5, value=0.897326648235321),
...]
size_guidance: Information on how much data the EventAccumulator should
store in memory. The DEFAULT_SIZE_GUIDANCE tries not to store too much
so as to avoid OOMing the client. The size_guidance should be a map
from a `tagType` string to an integer representing the number of
items to keep per tag for items of that `tagType`. If the size is 0,
all events are stored.
Upvotes: 58
Reputation: 5073
To finish user1501961's answer, you can then just export the list of scalars to a csv file easily with pandas pd.DataFrame(ea.Scalars('Loss)).to_csv('Loss.csv')
Upvotes: 9