David Wang
David Wang

Reputation: 41

Apache Beam count HBase row block and not return

Start to try out the Apache Beam and try to use it to read and count HBase table. When try to read the table without the Count.globally, it can read the row, but when try to count number of rows, the process hung and never exit.

Here is the very simple code:

Pipeline p = Pipeline.create(options);

p.apply("read",    HBaseIO.read().withConfiguration(configuration).withTableId(HBASE_TABLE))
  .apply(ParDo.of(new DoFn<Result, String>() {
   @ProcessElement
   public void processElement(ProcessContext c) {
        Result result = c.element();
        String rowkey = Bytes.toString(result.getRow());
        System.out.println("row key: " + rowkey);
        c.output(rowkey);
   }
}))
.apply(Count.<String>globally())
.apply("FormatResults", MapElements.via(new SimpleFunction<Long, String>() {
      public String apply(Long element) {
          System.out.println("result: " + element.toString());
          return element.toString();
      }
 }));

when use Count.globally, the process never finish. When comment it out, the process print all the rows.

Anyy ideas?

Upvotes: 0

Views: 292

Answers (1)

iemejia
iemejia

Reputation: 81

Which version of beam are you using?

Thanks for bringing this issue. I tried to reproduce your case and indeed there seems to be an issue with colliding versions of guava that breaks transforms with HBaseIO. I sent a pull request to fix the shading of this, I will keep you updated once it is merged so you can test if it works.

Thanks again.

Upvotes: 1

Related Questions