Sam
Sam

Reputation: 427

TensorFlow: variable number of outputs per input, known only at runtime

I am writing an op in Tensorflow that takes in some input (a reference to a file) and produces a variable amount of output based on that input (a bunch of "chunks" of the file). The size of each chunk is specified when building the graph (e.g. 50 "records" from the file = 1 chunk), but the total size of the input (number of records in the file) is unknown when the graph is constructed. I cannot specify this a priori because the input file is large (10s of GB), so scanning it is not feasible for my application.

My first failed attempt was having the op produce one "output chunk" per Compute call. However, this left the rest of the chunks for a given input unproduced. Upon further inspection it looks like the runtime is not designed for this (though if I'm wrong please let me know!).

I tried creating the output of the op in as a tensor that is unknown in only the first dimension. For example, if the output of a single chunk is a TensorShape([2]) (using the TensorShape constructor), then the output of a multi-chunk version would be TensorShape([None, 2]). However, this precludes my use of many of the other features of Tensorflow (such as FIFOQueue, which requires full-defined shapes).

Can an op be created in Tensorflow where either of the following is true?:

Upvotes: 1

Views: 883

Answers (2)

Sam
Sam

Reputation: 427

I eventually used a tensorflow queue directly.

In the op definition:

REGISTER_OP("MyOp")
...
.Input("output_queue_handle: resource")
...;

Then in the op kernel itself:

#include "tensorflow/core/framework/queue_interface.h"

class MyOpImpl : public OpKernel {
...
void Compute(OpKernelContext *ctx) override {
if (!output_queue) {
    OP_REQUIRES_OK(LookupResource(ctx, HandleFromInput(ctx, 1), &output_queue_));
}

while (there_is_still_stuff_to_enqueue_from_input) {
... // do actual computation here
QueueInterface::Tuple tuple;
// construct tuple here
OP_REQUIRES_OK(output_queue_->ValidateTuple(tuple);
Notification n;
queue_->TryEnqueue(tuple, ctx, [&n]() { n.Notify(); });
n.WaitForNotification();
}

return Status::OK();
}

private:
QueueInterface *output_queue_ = nullptr;
};

Upvotes: 0

Hugh Perkins
Hugh Perkins

Reputation: 8592

You can use a tensorflow queue.

A tensorflow queue lets you pump a number of items into it, without needing to specify beforehand how many. Then the queue can be read, until it is empty.

Upvotes: 1

Related Questions