Reputation: 2759
I would like to be able to write new entries into HBase from a distributed (not local) Storm topology. There exist a few GitHub projects that provide either HBase Mappers or pre-made Storm bolts to write Tuples into HBase. These projects provide instructions for executing their samples on the LocalCluster.
The problem that I am running into with both of these projects, and directly accessing the HBase API from the bolt, is that they all require the HBase-site.xml file to be included on the classpath. With the direct API approach, and perhaps with the GitHub ones as well, when you execute HBaseConfiguration.create();
it will try to find the information it needs from an entry on the classpath.
How can I modify the classpath for the storm bolts to include the Hbase configuration file?
Update: Using danehammer's answer, this is how i got it working
Copy the following files into your ~/.storm directory:
Next, in your topology class's main method get the HBase Configuration and serialize it:
final Configuration hbaseConfig = HBaseConfiguration.create();
final DataOutputBuffer databufHbaseConfig = new DataOutputBuffer();
hbaseConfig.write(databufHbaseConfig);
final byte[] baHbaseConfigSerialized = databufHbaseConfig.getData();
Pass the byte array to your spout class through the constructor. The spout class saves this byte array to a field (Do not deserialize in the constructor. I found that if the spout has a Configuration field you will get a cannot serialize exception when running the topology)
in the spout's open method, deserialize the config and access the hbase table:
Configuration hBaseConfiguration = new Configuration();
ByteArrayInputStream bas = new ByteArrayInputStream(baHbaseConfigSerialized);
hBaseConfiguration.readFields(new DataInputStream(bas));
HTable tbl = new HTable(hBaseConfiguration, HBASE_TABLE_NAME);
Scan scan = new Scan();
scan.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("YOUR_COLUMN"));
scnrTbl = tbl.getScanner(scan);
Now, in your nextTuple method you can use the Scanner to get the next row:
Result rsltWaveform = scnrWaveformTbl.next();
Extract what you want from the result, and pass those values in some serializable object to the bolts.
Upvotes: 3
Views: 3002
Reputation: 416
When you deploy a topology with the "storm jar" command, the ~/.storm
folder will be on the classpath (see this link under jar command). If you placed the hbase-site.xml file (or related *-site.xml files) in that folder, HBaseConfiguration.create()
during "storm jar" would find that file and correctly return you an org.apache.hadoop.configuration.Configuration
. This would need to be stored and serialized within your topology in order to distribute that config around the cluster.
Upvotes: 2