Stefan Krawczyk
Stefan Krawczyk

Reputation: 31

How do I save a TFDV stats in the correct format for them to be loaded back in?

It is puzzling to me that there is a tfdv.load_statistics() function, but no corresponding tfdv.write_statistics() function. How do I go about saving the statistics, and then loading them again?

e.g.

import tensorflow_data_validation as tfdv
stats = tfdv.generate_statistics_from_dataframe(df)

# how do I save?


# load back for later use
saved_stats = tfdv.load_statistics('saved_stats.stats')

I can save the string representation to a file, but this is not the format that load_statistics expects.

with open('saved_stats.stats', 'w') as o:
    o.write(str(stats))

Pointers anyone?

Upvotes: 0

Views: 836

Answers (4)

Pritam Dodeja
Pritam Dodeja

Reputation: 326

There's a function called tfdv.load_stats_binary that you can use to solve this problem.

Upvotes: 0

pvasek
pvasek

Reputation: 1156

In the current tfdv version 1.3.0 there are the following methods that can be used:

Example:

import tensorflow_data_validation as tfdv

stats = tfdv.generate_statistics_from_dataframe(df)
stats_path = "my-stats-file.stats"

# saving
tfdv.write_stats_text(stats, stats_path)


# loading
stats = tfdv.load_stats_text(stats_path)

Upvotes: 1

Amine_h
Amine_h

Reputation: 129

have you tried this : tfdv.utils.stats_util.write_stats_text ?

Upvotes: 1

Stefan Krawczyk
Stefan Krawczyk

Reputation: 31

Okay figure out this hacky way to do it.

df = ... # create pandas df
from tensorflow_metadata.proto.v0 import statistics_pb2
import tensorflow_data_validation as tfdv
stats = tfdv.generate_statistics_from_dataframe(df)

# save it
with open('saved_stats.stats', 'wb') as o:
    o.write(stats.SerializeToString())

# load back for later use
with open('saved_stats.stats', 'rb') as i:
    loaded_stats = statistics_pb2.FromString(i.read())

Upvotes: 0

Related Questions