Anand
Anand

Reputation: 21320

Best way to send Spark dataframe as JSON body over HTTP POST

I have a Spark dataframe that I need to send as body of HTTP POST request. The storage system is Apache Solr. We are creating Spark dataframe by reading Solr collection. I can use Jackson library to create JSON and send it over HTTP POST. Also, dataframe may have millions of records so preferred way is to send them in batches over HTTP.

Below are the two approaches I can think.

I would like to know which one of the above two approaches is preferred?

Upvotes: 1

Views: 884

Answers (1)

Ram Ghadiyaram
Ram Ghadiyaram

Reputation: 29185

After getting the Spark dataframe, we can save it in some other SOLR collection (using Spark) and then data from that collection will be read to get the data in batches using SOLR API (using rows, start parameters), create JSON out of it and send it over HTTP request.

out of your 2 approaches 2nd approach is best since you have paging feature in solrj 1) save your dataframe as solr documents with indexes 2) use solrj is api which will interact with your solr collections and will return solr documents based on your criteria. 3) you can convert them in to json using any parser and present in uis or user queries.

Infact this is not new approach, people who are using hbase with solr will do in the same way (since querying from hbase is really slow compared to querying from solr collections), where each hbase table is solr collection and can be queried via solrj and present to dashborads like angular js.

more illustrative diagram like below..

enter image description here

Upvotes: 0

Related Questions