Alex Harvey
Alex Harvey

Reputation: 215

sideInput from startBundle

Prior to the most recent SDK I was relying on the ability to access my sideInput inside of startBundle of my DoFn. I’m not sure of the history of refactoring but I seem to be having issues doing this now.

Essentially I have an array that I want to process across within my process() method and the array is reasonably sized that will fit in memory.

Is it valid to expect to access a sideInput within startBundle? And if so, how can I do that if startBundle is sent a Context instead of a ProcessContext?

Example:

    @Override
    public void startBundle(DoFn<KV<String, Iterable<String>>, String>.Context c) throws Exception {
        uniqueIds = Lists.newArrayList(c.sideInput(iterableView));
        super.startBundle(c);
    }

Upvotes: 0

Views: 413

Answers (1)

Frances
Frances

Reputation: 4041

The history is explained here: Why did #sideInput() method move from Context to ProcessContext in Dataflow beta

Do you need to do any processing on your side input to prepare it for use in processElement? If not, then I'd suggest just using View.asList() or View.asMap() and calling that directly in processElement() -- Dataflow will do caching when possible to make this cheap. (Note View.asList() is currently available on Github and will be in the next Maven release.)

If you need to do processing on your side input, and you are using the (default) GlobalWindow, then you can lazily initialize a local variable from within processElement(). However, if you are using Window.into(), you'll need to invalidate that cache every time the element's window changes.

Upvotes: 1

Related Questions