florins
florins

Reputation: 1655

Streaming HDFS data to Storm (aka HDFS spout)

I would like to know if there is any spout implementation for streaming data from HDFS to Storm (something similar to Spark Streaming from HDFS). I know that there is bolt implementation to write data into HDFS (https://github.com/ptgoetz/storm-hdfs and http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.3/bk_user-guide/content/ch_storm-using-hdfs-connector.html), but for the other way around I could not find. I appreciate any suggestions and hints.

Upvotes: 3

Views: 1591

Answers (1)

Kit Menke
Kit Menke

Reputation: 7056

An option is to use the Hadoop HDFS java API. Assuming you are using maven, you would include hadoop-common in your pom.xml:

<dependency>
   <groupId>org.apache.hadoop</groupId>
   <artifactId>hadoop-common</artifactId>
   <version>2.6.0.2.2.0.0-2041</version>
</dependency>

Then, in your spout implementation you would use the HDFS FileSystem object. For example, here is some pseudo code for emitting each line in a file as a string:

@Override
public void nextTuple() {
   try {
      Path pt=new Path("hdfs://servername:8020/user/hdfs/file.txt");
      FileSystem fs = FileSystem.get(new Configuration());
      BufferedReader br = new BufferedReader(new InputStreamReader(fs.open(pt)));
      String line = br.readLine();
      while (line != null){
         System.out.println(line);
         line=br.readLine();
         // emit the line which was read from the HDFS file
         // _collector is a private member variable of type SpoutOutputCollector set in the open method;
         _collector.emit(new Values(line));
      }
   } catch (Exception e) {
      _collector.reportError(e);
      LOG.error("HDFS spout error {}", e);
   }
}

Upvotes: 3

Related Questions