Solr: how to specify a schema during JSON and CSV import?

Question

I'm new to Solr and I'm trying to test its functionalities. I come from RDBMS world and was wondering how Solr would perform with my data.

I created a new core:

$ bin/solr create -c test

and successfully loaded a JSON file using:

$ bin/post -c test file.json

The first record of file.json looks like this:

{"attr":"01234"}

but Solr stores it as:

{"attr":1234}

I began defining a Data Import Handler following this tutorial (Youtube video) in order to correctly store my data, and found that JSON can't be processed by DIH. I'm stuck at the definition of data-config.xml because the tutorial treats XML files using the XPathEntityProcessor but can't find a JSON or even a CSV processor (I can easily retrieve a CSV version of file.json, so loading a CSV or a JSON is the same for me). The official documentation is a bit of a mess and doesn't provide many useful examples. The solely processors that probably treat JSON and CSV documents are LineEntityProcessor and PlainTextEntityProcessor ( Official Documentation).

This other link from the Solr Wiki states:

Goals

...

Make it possible to plugin any kind of datasource (ftp,scp etc) and any other format of user choice (JSON,csv etc)

so I guess it is really possible, but HOW?

I found a similar question posted in 2014 that no one answered here, so was wondering if in 2016, with the newer versions of Solar, there is a well known solution to this problem.

So the question is: how to import JSON and CSV documents using a specific data schema?

UPDATE

Executing http://localhost:8983/solr/test/dihupdate?command=full-import doesn't trigger any error but doesn't load any document. Here are the various xml files located in the core directory:

solrconfig.xml

...

...

  
    data-config.xml
  

...

schema.xml

...






id
...

data-config.xml

Alexandre Rafalovitch · Accepted Answer

In the Solr distribution, there is a films example (in example/films) that shows how to index JSON and takes advantage of both exact field definitions and type auto-detect. The instructions (README.txt) include the results you will see if you forget to do one of the steps as well.

I suggest you experiment with that and then apply that knowledge to your own use case.

Solr: how to specify a schema during JSON and CSV import?

UPDATE

Answers (2)

Related Questions