pavan
pavan

Reputation: 791

Index xml output of a rest web service into a solr server

How can i index a solr server with the content of webservice.

my webservice output looks like this enter image description here

now i want to index the solr serverwith the content under xml as shown above

how can i index thiss into apache solr.

Upvotes: 1

Views: 1399

Answers (2)

Jesvin Jose
Jesvin Jose

Reputation: 23098

Make a script in your favorite scripting language (Python for me). I did something similar with databases and hope a similar solution will go well fro you.

With Python:

  1. urllib2 can fetch the body of your webpage, given the URL.
  2. Use an XML parser like etree to recursively descend down the tree, and convert it into an XML/ JSON hierarchy of your choosing (as you prefer)
  3. Upload it to Solr (Solr allows uploads in XML, JSON, CSV etc).

And run this script periodically like a cron-job.


You will need two pieces of code: one to query your RESTful service and acquire the body of the response; the other to upload a formatted document to Solr.

This piece of code uploads a Python object request_obj to the given request_url and a solr's response is returned as a Python object. A native Python object (composed of dictionaries (associative arrays), lists, strings, numbers) translates to JSON easily (with 1-2 caveats).

Use this only as reference. I guarantee no suitablity for your purpose.

Dont forget to use /update/json?wt=python which is available from Solr 3.3 onwards. You need MultipartPostHandler library.

def solr_interface(self,request_url,request_obj):
    request=json.dumps(request_obj,indent=4,encoding="cp1252")
    opener = urllib2.build_opener(MultipartPostHandler.MultipartPostHandler)
    urllib2.install_opener(opener)
    req = urllib2.Request(request_url, request)
    req.add_header("Content-Type", "application/json")
    text_response = urllib2.urlopen(req).read().strip()
    return ast.literal_eval(text_response)

As for parsing (and composing) XML in Python, use these excellent tutorials http://www.learningpython.com/2008/05/07/elegant-xml-parsing-using-the-elementtree-module/ and http://effbot.org/zone/element.htm

This is a commandline sample.

from xml.etree import ElementTree as ET
elem =ET.fromstring("<doc><p>This is a block</p><p>This is another block</p></doc>")
for subelement in elem:
...     print subelement.text
... 
This is a block
This is another block

Upvotes: 1

Paige Cook
Paige Cook

Reputation: 22555

You will need to loosely follow the steps below to index your data.

  1. Configure your Index Schema for the fields you want indexed. Based on the example above, you would want fields for classname, packagename and url. See SolrSchema for more details.
  2. Add the documents to your index. Please see one of the following for details on how you can do this.

Upvotes: 1

Related Questions