Reputation: 1367
I have a lot of records in hbase store (millions) like this
key = user_id:service_id:usage_timestamp value = some_int
That means an user used some service_id for some_int at usage_timestamp.
And now I wanted to provide some rest api for aggregating that data. For example "find sum of all values for requested user" or "find max of them" and so on. So I'm looking for the best practise. Simple java application doesn't met my performance expectations.
My current approach - aggregates data via apache spark application, looks good enough but there are some issues to use it with java rest api so far as spark doesn't support request-response model (also I have took a view into spark-job-server, seems raw and unstable)
Thanks,
Any ideas?
Upvotes: 2
Views: 1121
Reputation: 16076
I see two possibilities:
If you want to isolate third-party apps from Spark you can create simple application that will have user-friendly endpoint and will translate query received by endpoint to Livy-Spark jobs or SQL that will be used with Spark Thrift Server
Upvotes: 1
Reputation: 29175
I would offer Hbase + Solr if you are using Cloudera (i.e Cloudera search)
Solrj api for aggregating data(instead of spark), to interact with rest services
Solr Solution (in cloudera its Cloudera search) :
Indexing : Use NRT lily indexer or custom mapreduce solr document creator to load data as solr documents.
If you don't like NRT lily indexer you can use spark or mapreduce job with Solrj to do the indexing For ex: Spark Solr : Tools for reading data from Solr as a Spark RDD and indexing objects from Spark into Solr using SolrJ.
Data Retrieval : Use Solrj to get the solr docs from your web service call. In Solrj,
There is FieldStatInfo through which Sum,Max etc.... can be achieved
There are Facets and Facetpivots to group data
Pagination is supported for rest API calls
you can integrate solr results with Jersey or some other web service as we have already implemented this way.
/**This method returns the records for the specified rows from Solr Server which you can integrate with any rest api like jersey etc...
*/
public SolrDocumentList getData(int start, int pageSize, SolrQuery query) throws SolrServerException {
query.setStart(start); // start of your page
query.setRows(pageSize);// number of rows per page
LOG.info(ClientUtils.toQueryString(query, true));
final QueryResponse queryResponse = solrCore.query(query, METHOD.POST); // post is important if you are querying huge result set Note : Get will fail for huge results
final SolrDocumentList solrDocumentList = queryResponse.getResults();
if (isResultEmpty(solrDocumentList)) { // check if list is empty
LOG.info("hmm.. No records found for this query");
}
return solrDocumentList;
}
Also look at
my answer in "Create indexes in solr on top of HBase"
Note : I think same can be achieved with elastic search as well. But out of my experience , Im confident with Solr + solrj
Upvotes: 1