Reputation: 31
It is puzzling to me that there is a tfdv.load_statistics()
function, but no corresponding tfdv.write_statistics()
function. How do I go about saving the statistics, and then loading them again?
e.g.
import tensorflow_data_validation as tfdv
stats = tfdv.generate_statistics_from_dataframe(df)
# how do I save?
# load back for later use
saved_stats = tfdv.load_statistics('saved_stats.stats')
I can save the string representation to a file, but this is not the format that load_statistics expects.
with open('saved_stats.stats', 'w') as o:
o.write(str(stats))
Pointers anyone?
Upvotes: 0
Views: 836
Reputation: 326
There's a function called tfdv.load_stats_binary
that you can use to solve this problem.
Upvotes: 0
Reputation: 1156
In the current tfdv
version 1.3.0 there are the following methods that can be used:
Example:
import tensorflow_data_validation as tfdv
stats = tfdv.generate_statistics_from_dataframe(df)
stats_path = "my-stats-file.stats"
# saving
tfdv.write_stats_text(stats, stats_path)
# loading
stats = tfdv.load_stats_text(stats_path)
Upvotes: 1
Reputation: 31
Okay figure out this hacky way to do it.
df = ... # create pandas df
from tensorflow_metadata.proto.v0 import statistics_pb2
import tensorflow_data_validation as tfdv
stats = tfdv.generate_statistics_from_dataframe(df)
# save it
with open('saved_stats.stats', 'wb') as o:
o.write(stats.SerializeToString())
# load back for later use
with open('saved_stats.stats', 'rb') as i:
loaded_stats = statistics_pb2.FromString(i.read())
Upvotes: 0