Vincent Teyssier
Vincent Teyssier

Reputation: 2217

TFDV Tensorflow Data Validation: how can I save/load the protobuf schema to/from a file

TFDV generates schema as a Schema protocol buffer. However it seems that there is no helper function to write/read schema to/from a file.

schema = tfdv.infer_schema(stats)

How can I save it/load it ?

Upvotes: 5

Views: 2388

Answers (3)

mannuscript
mannuscript

Reputation: 4941

tensorflow_data_validation itself provides you util functions for this:

from tensorflow_data_validation.utils.schema_util import write_schema_text, load_schema_text
write_schema_text(schema, "./my_schema")
schema = load_schema_text("./my_schema")

Upvotes: 1

Tim Smole
Tim Smole

Reputation: 121

If you will be using it with Tensorflow Transform then I would suggest the following functions:

import tensorflow_data_validation as tfdv
from tensorflow.python.lib.io import file_io
from tensorflow_transform.tf_metadata import metadata_io

# Define file path
file_io.recursive_create_dir(OUTPUT_DIR)
schema_file = os.path.join(OUTPUT_DIR, 'schema.pbtxt')

# Write schema
tfdv.write_schema_text(schema, schema_file)

# Read schema with tfdv
schema = tfdv.load_schema_text(schema_file)

# Read schema with tensorflow_transform
schema = metadata_io.read_metadata(OUTPUT_DIR)

The output is human-readable - similar to JSON. But if you prefer to save it in plain JSON format then you can use the following:

from google.protobuf import json_format
from tensorflow.python.lib.io import file_io
from tensorflow_metadata.proto.v0 import schema_pb2

def write_schema(schema, output_path):
    schema_text = json_format.MessageToJson(schema)
    file_io.write_string_to_file(output_path, schema_text)

def load_schema(input_path):
    schema_text = file_io.read_file_to_string(input_path)
    schema = json_format.Parse(schema_text, schema_pb2.Schema())
    return schema   

Or if you don't need it to be in human-readable format you can use SerializeToString() and ParseFromString(data) for de/serialization like described here.

Upvotes: 2

Paul Suganthan
Paul Suganthan

Reputation: 86

You can use the following methods to write/load the schema to/from a file.

from google.protobuf import text_format
from tensorflow.python.lib.io import file_io
from tensorflow_metadata.proto.v0 import schema_pb2

def write_schema(schema, output_path):
  schema_text = text_format.MessageToString(schema)
  file_io.write_string_to_file(output_path, schema_text)

def load_schema(input_path):
  schema = schema_pb2.Schema()
  schema_text = file_io.read_file_to_string(input_path)
  text_format.Parse(schema_text, schema)
  return schema      

Upvotes: 6

Related Questions