Reputation: 21320
I have a Spark dataframe
that I need to send as body of HTTP POST
request. The storage system is Apache Solr
. We are creating Spark dataframe
by reading Solr
collection. I can use Jackson
library to create JSON
and send it over HTTP POST
. Also, dataframe may have millions of records so preferred way is to send them in batches
over HTTP.
Below are the two approaches I can think.
We can use foreach
/foreachPartition
operations of Spark dataframe
and call HTTP POST
which means that HTTP call will happen within each executor (If I am not wrong). Is this approach right? Also, it means if I have 3 executors
then there will be 3 HTTP calls that we can make in parallel. Right? But opening and closing HTTP connection so many times, will it not cause issue?
After getting the Spark dataframe
, we can save it in some other SOLR
collection (using Spark) and then data from that collection will be read to get the data in batches using SOLR API
(using rows, start
parameters), create JSON out of it and send it over HTTP request.
I would like to know which one of the above two approaches is preferred?
Upvotes: 1
Views: 884
Reputation: 29185
After getting the Spark dataframe, we can save it in some other SOLR collection (using Spark) and then data from that collection will be read to get the data in batches using SOLR API (using rows, start parameters), create JSON out of it and send it over HTTP request.
out of your 2 approaches 2nd approach is best since you have paging feature in solrj 1) save your dataframe as solr documents with indexes 2) use solrj is api which will interact with your solr collections and will return solr documents based on your criteria. 3) you can convert them in to json using any parser and present in uis or user queries.
Infact this is not new approach, people who are using hbase with solr will do in the same way (since querying from hbase is really slow compared to querying from solr collections), where each hbase table is solr collection and can be queried via solrj and present to dashborads like angular js.
more illustrative diagram like below..
Upvotes: 0