Reputation: 11

Is it possible to write in ORC format using hdfs2FileSink operator into HDFS in IBM infosphere streams or any other way?

Is it possible to write in ORC format using hdfs2FileSink operator in IBM infosphere streams

Upvotes: 0

Answers (1)

ndsilva

Reputation: 409

No, it isn't possible at this time using the HDFS2FileSink operator. It only supports text or binary.

The streamsx.parquet toolkit has support for writing to Parquet.

Otherwise, you would have to create your own Java operator that will receive the data and use the ORC API to write the data.

It is fairly straightforward to create a Java operator, as shown in this video. The Java Operator Development guide can walk you through the process. Specifically see writing a sink operator.

After creating a new Java operator, add code to write to the ORC API in the process method:

  @Override
public void process(StreamingInput<Tuple> stream, Tuple tuple)
        throws Exception {
    // TODO Insert code here to process the incoming tuple, 
    // typically sending tuple data to an external system or data store.
    // String value = tuple.getString("AttributeName");
}

I would start with the ORC home page, choose the appropriate link for Hive, Hadoop

Upvotes: 0

Is it possible to write in ORC format using hdfs2FileSink operator into HDFS in IBM infosphere streams or any other way?

Answers (1)

Related Questions