Reputation: 2321
We currently are performing various transformations into our HDFS clusters. Being new to the stack, I am told that the transformed data is being stored in a binary format in the form of containers.
Right now, the only way to query this data is through an intensive command via the CLI.
My question is - is it possible to build a RESTful interface to search the data in these containers? The decrypted data is JSON format.
The reason I am doing this is to scale testing - if I can retrieve the data in a readable, parseable format (as opposed to binary), I can create automated testing hooks that can trigger based on updates. Changes can then be validated against the source easily.
Upvotes: 0
Views: 201
Reputation: 191738
Anything is possible™
"In the form of containers" is very unclear. "Containers" means lots of things - YARN containers, Docker containers, etc...
First thoughts would be to try Hive, PrestoDB, or Livy (Spark). Each would be easier to create a query against via a REST API.
Alternatively, rather than start up a file system scan, you could store the data differently. For example, HBase or Accumulo or Ignite.
If you want really fast searches, though, you'll actually want to index said data. Solr or Elasticsearch are the two popular options, both of which natively expose REST APIs explicitly for searching data
Upvotes: 1