Reputation: 615
I am using colab to run a tutorial on tensorflow ranking. It uses wget to fetch the tfrecord:
!wget -O "/tmp/train.tfrecords" "http://ciir.cs.umass.edu/downloads/Antique/tf-ranking/ELWC/train.tfrecords"
I am using this code to try to look at the structure of the tfrecord:
for example in tf.compat.v1.python_io.tf_record_iterator("/tmp/train.tfrecords"):
print(tf.train.Example.FromString(example))
break
And I am getting:
DecodeError: Error parsing message
How to generally look at the structure of tfrecords instead?
A second question: Where to find documentation on classes like tf.train.Example
? I just find this empty page.
Upvotes: 3
Views: 816
Reputation: 48330
The insight of the problem is that the records are serialized using another schema: the ExampleListWithContext
Schema, instead of the basic tf.train.Example
schema. Updating the right deserialization solves the problem.
filenames = ['/tmp/train.tfrecords']
raw_dataset = tf.data.TFRecordDataset(filenames)
for e in raw_dataset.take(1):
ELWC = input_pb2.ExampleListWithContext()
v = ELWC.FromString(e.numpy())
print(v.context)
for e in v.examples:
print(e)
outputs:
features {
feature {
key: "query"
value {
bytes_list {
value: "why do ..."
}
}
}
feature {
key: "query_bert_encoder_outputs"
value {
float_list {
...
}}
Upvotes: 3