PascalIv
PascalIv

Reputation: 615

Reading a tfrecord: DecodeError: Error parsing message

I am using colab to run a tutorial on tensorflow ranking. It uses wget to fetch the tfrecord:

!wget -O "/tmp/train.tfrecords" "http://ciir.cs.umass.edu/downloads/Antique/tf-ranking/ELWC/train.tfrecords"

I am using this code to try to look at the structure of the tfrecord:

for example in tf.compat.v1.python_io.tf_record_iterator("/tmp/train.tfrecords"):
    print(tf.train.Example.FromString(example))
    break

And I am getting:

DecodeError: Error parsing message

How to generally look at the structure of tfrecords instead?

A second question: Where to find documentation on classes like tf.train.Example? I just find this empty page.

Upvotes: 3

Views: 816

Answers (1)

fabmilo
fabmilo

Reputation: 48330

The insight of the problem is that the records are serialized using another schema: the ExampleListWithContext Schema, instead of the basic tf.train.Example schema. Updating the right deserialization solves the problem.

filenames = ['/tmp/train.tfrecords']
raw_dataset = tf.data.TFRecordDataset(filenames)
for e in raw_dataset.take(1):
    ELWC = input_pb2.ExampleListWithContext()
    v = ELWC.FromString(e.numpy())
    print(v.context)
    for e in v.examples:
        print(e)

outputs:

features {
  feature {
    key: "query"
    value {
      bytes_list {
        value: "why do ..."
      }
    }
  }
  feature {
    key: "query_bert_encoder_outputs"
    value {
      float_list {
...
}}

Upvotes: 3

Related Questions