Index xml output of a rest web service into a solr server

Question

How can i index a solr server with the content of webservice.

my webservice output looks like this enter image description here

now i want to index the solr serverwith the content under xml as shown above

how can i index thiss into apache solr.

Jesvin Jose · Accepted Answer

Make a script in your favorite scripting language (Python for me). I did something similar with databases and hope a similar solution will go well fro you.

With Python:

urllib2 can fetch the body of your webpage, given the URL.
Use an XML parser like etree to recursively descend down the tree, and convert it into an XML/ JSON hierarchy of your choosing (as you prefer)
Upload it to Solr (Solr allows uploads in XML, JSON, CSV etc).

And run this script periodically like a cron-job.

You will need two pieces of code: one to query your RESTful service and acquire the body of the response; the other to upload a formatted document to Solr.

This piece of code uploads a Python object request_obj to the given request_url and a solr's response is returned as a Python object. A native Python object (composed of dictionaries (associative arrays), lists, strings, numbers) translates to JSON easily (with 1-2 caveats).

Use this only as reference. I guarantee no suitablity for your purpose.

Dont forget to use /update/json?wt=python which is available from Solr 3.3 onwards. You need MultipartPostHandler library.

def solr_interface(self,request_url,request_obj):
    request=json.dumps(request_obj,indent=4,encoding="cp1252")
    opener = urllib2.build_opener(MultipartPostHandler.MultipartPostHandler)
    urllib2.install_opener(opener)
    req = urllib2.Request(request_url, request)
    req.add_header("Content-Type", "application/json")
    text_response = urllib2.urlopen(req).read().strip()
    return ast.literal_eval(text_response)

As for parsing (and composing) XML in Python, use these excellent tutorials http://www.learningpython.com/2008/05/07/elegant-xml-parsing-using-the-elementtree-module/ and http://effbot.org/zone/element.htm

This is a commandline sample.

from xml.etree import ElementTree as ET
elem =ET.fromstring("This is a block
This is another block")
for subelement in elem:
...     print subelement.text
... 
This is a block
This is another block

Index xml output of a rest web service into a solr server

Answers (2)

Related Questions