Reputation: 41
Start to try out the Apache Beam and try to use it to read and count HBase table. When try to read the table without the Count.globally, it can read the row, but when try to count number of rows, the process hung and never exit.
Here is the very simple code:
Pipeline p = Pipeline.create(options);
p.apply("read", HBaseIO.read().withConfiguration(configuration).withTableId(HBASE_TABLE))
.apply(ParDo.of(new DoFn<Result, String>() {
@ProcessElement
public void processElement(ProcessContext c) {
Result result = c.element();
String rowkey = Bytes.toString(result.getRow());
System.out.println("row key: " + rowkey);
c.output(rowkey);
}
}))
.apply(Count.<String>globally())
.apply("FormatResults", MapElements.via(new SimpleFunction<Long, String>() {
public String apply(Long element) {
System.out.println("result: " + element.toString());
return element.toString();
}
}));
when use Count.globally, the process never finish. When comment it out, the process print all the rows.
Anyy ideas?
Upvotes: 0
Views: 292
Reputation: 81
Which version of beam are you using?
Thanks for bringing this issue. I tried to reproduce your case and indeed there seems to be an issue with colliding versions of guava that breaks transforms with HBaseIO. I sent a pull request to fix the shading of this, I will keep you updated once it is merged so you can test if it works.
Thanks again.
Upvotes: 1