Matthew Wilcoxson
Matthew Wilcoxson

Reputation: 3622

Stop Solr replacing colons with underscores in fieldnames

I'm moving a system from using Solr 1.4 to Solr 6.x (or possible 5.x) and the fields names all contain colons (e.g. "rdf:type" ). I've converted all the configuration files to Solr 6.x version using a schema.xml file. I can see "rdf:type" in Solr's schema view.

These fieldnames worked fine in 1.4 but now colons are automatically converted to underscores when indexing is attempted.

For instance using Solr's built in interface, if I try to submit a simple document like:

{'rdf:type': 'http://purl.org/ontology/bibo/Note'}

I get an error message saying:

ERROR: [doc=682e3f70-a4bc-4336-9f69-e7d620fe5fff] unknown field 'rdf_type'

Is it possible to "turn off" this feature? Will using colons cause problems with then newest versions of Solr?

(On a side note, making "rdf:type" a compulsory field and then not including it causes an error which reads: "missing required field: rdf:type", i.e. it displays the correct name)

Upvotes: 0

Views: 311

Answers (1)

MatsLindh
MatsLindh

Reputation: 52912

This behaviour is not "native" to Solr itself, but is part of the default update processor chain that is added to the configuration for the Schemaless mode in the bundled examples (which is the default).

The reason is that lucene uses : to separate field names from the values to be queried in those fields, so it's usually easier to keep : out of the field name.

You can change this by removing the FieldNameMutatingUpdateProcessorFactory from the update chain, or use your own schema (without the update processor chain).

Upvotes: 2

Related Questions