Reputation: 53
There is a requirement where we get a stream of data from Kafka Stream and our objective is to push this data to SOLR.
We did some reading but we could find there are lot of Kafka Connect solutions available in the market, but the problem is we do not know which is the best solution and how to achieve.
The options are:
There is no much documentation or in depth information provided for the above mentioned options.
Will anyone be kind enough to let me know
How we can use a Solr connector and integrate with Kafka stream without using Confluent?
Solr-Kafka Connector: https://github.com/MSurendra/kafka-connect-solr
Also, With regard to Apache Storm, will it be possible for Apache Storm to accept the Kafka Stream and push it to Solr, though we would need some sanitization of data before pushing it to Solr?
Upvotes: 0
Views: 1757
Reputation: 191973
I am avoiding Storm here, because the question is mostly about Kafka Connect
CAVEAT - This Solr Connector in the question is using Kakfa 0.9.0.1 dependencies, therefore, it is very unlikely to work with the newest Kafka API's.
This connector is untested by me. Follow at your own risk
The following is an excerpt from Confluent's documentation on using community connectors, with some emphasis and adaptations. In other words, written for Kafka Connects not included in Confluent Platform.
$ git clone https://github.com/MSurendra/kafka-connect-solr
Change into the newly cloned repo, and checkout the version you want. (This Solr connector has no releases like the Confluent ones). You will typically want to checkout a released version.
$ cd kafka-connect-solr; mvn package
From here, see Installing Plugins
We copy the resulting Maven output in the target
directory into one of the directories on the Kafka Connect worker’s plugin path (the plugin.path
property).
For example, if the plugin path includes the /usr/local/share/kafka/plugins
directory, we can use one of the following techniques to make the connector available as a plugin.
As mentioned in the Confluent docs, the export CLASSPATH=<some path>/kafka-connect-solr-1.0.jar
option would work, though plugin.path
will be the way moving forward (Kafka 1.0+)
You should know which option to use based on the result of mvn package
With this Solr Connector, we get a single file named kafka-connect-solr-1.0.jar
.
We copy that file into the /usr/local/share/kafka/plugins
directory:
$ cp target/kafka-connect-solr-1.0.jar /usr/local/share/kafka/plugins/
(This does not apply to the Solr Connector)
If the connector’s JARs are collected into a subdirectory of the build’s target directories, we can copy all of these JARs into a plugin directory within the /usr/local/share/kafka/plugins
, for example
$ mkdir -p /usr/local/share/kafka/plugins/kafka-connect-solr
$ cp target/kafka-connect-solr-1.0.0/share/java/kafka-connect-solr/* /usr/local/share/kafka/plugins/kafka-connect-solr/
Note
Be sure to install the plugin on all of the machines where you’re running Kafka Connect distributed worker processes. It is important that every connector you use is available on all workers, since Kafka Connect will distribute the connector tasks to any of the worker
If you have properly set plugin.path
or did export CLASSPATH
, then you can use connect-standalone
or connect-distributed
with the appropriate config file for that Connect project.
Regarding,
we would need some sanitization of data before pushing it to Solr
You would need to do that with a separate process like Kafka Streams, Storm, or other process prior to Kafka Connect. Write your transformed output to a secondary topic. Or write your own Kafka Connect Transform process. Kafka Connect has very limited transformations out of the box.
Also worth mentioning - JSON seems to be the only supported Kafka message format for this Solr connector
Upvotes: 2