Dan M
Dan M

Reputation: 820

Importing latitude and longitude into a location(LatLonPointSpatialField class) field in Solr

Alright, I am looking for general guidelines on how to import a CSV file containing the following fields

poi_name, latitude, longitude

into a Solr (7.x) core to perform geo queries? What is the right way to achieve this? I tried

Do I really need to go trough the trouble of defining a DataImportHandler to do this or it is sufficient to create a schema for all this?

What if the latitude and longitude are already there and I am trying to update the schema with the location field at a later time?

Can't find a good example for doing this, however there is an old example where the location field is automatically composed if latitude and longitude have predefined names with a suffix something like location_1_coordinate and location_2_coordinate this seems silly!

Upvotes: 1

Views: 2281

Answers (2)

Dan M
Dan M

Reputation: 820

Just conclude and aggregate the answer for anyone interested this is the solution I came to following MatsLindh suggestion. Context: CentOS 7 and Solr 7.5

  • Sample.csv content

name,lon,lat, A,22.9308852,39.3724824 B,22.5094530,40.2725792


  • relevant portion of the schema (managed-schema file)

<fieldType name="location" class="solr.LatLonPointSpatialField" docValues="true"/> ... <field name="lat" type="string" omitTermFreqAndPositions="true" indexed="true" required="true" stored="true"/> <field name="location" type="location" multiValued="false" stored="true"/> <field name="lon" type="string" omitTermFreqAndPositions="true" indexed="true" stored="true"/>


  • solrconfig.xml
<updateRequestProcessorChain name="uuid-location">
      <processor class="solr.UUIDUpdateProcessorFactory">
        <str name="fieldName">id</str>
      </processor>
        <processor class="solr.CloneFieldUpdateProcessorFactory"> 
            <str name="source">lat</str> 
            <str name="dest">location</str> 
        </processor> 
        <processor class="solr.CloneFieldUpdateProcessorFactory"> 
            <str name="source">lon</str> 
            <str name="dest">location</str> 
        </processor> 
       <processor class="solr.ConcatFieldUpdateProcessorFactory"> 
            <str name="fieldName">location</str> 
            <str name="delimiter">,</str> 
        </processor>
      <processor class="solr.LogUpdateProcessorFactory"/>
      <processor class="solr.RunUpdateProcessorFactory" />
     </updateRequestProcessorChain>
  <initParams path="/update/**,/query,/select,/tvrh,/elevate,/spell,/browse">
    <lst name="defaults">
      <str name="df">_text_</str>
      <str name="update.chain">uuid-location</str>
    </lst>
  </initParams>

and to import the sample file into the core run the following in bash

/opt/solr/bin/post -c your_core_name /opt/solr/sample.csv


And if you wonder how to query that data use

http://localhost:8983/solr/your_core_name/select?&q=*:*&fq={!geofilt%20sfield=location}&pt=42.27,-74.91&d=1

where pt is the lat-long point and d is the distance in kilometers.

Upvotes: 2

MatsLindh
MatsLindh

Reputation: 52902

First - you'll have to define a location field. The schemaless mode is made for quick prototyping, if you need more specific fields (and be sure that the fields get the correct type in production), you'll have to configure them explicitly. Use the LatLonPointSpatialField type for this, and make it single valued.

First define the field type to use (these are adopted from the Schema API documentation):

curl -X POST -H 'Content-type:application/json' --data-binary '{   "add-field-type" : {
     "name":"location_type",
     "class":"LatLonPointSpecialField"

}' http://localhost:8983/solr/gettingstarted/schema

Then add a field with that type:

curl -X POST -H 'Content-type:application/json' --data-binary '{
  "add-field":{
     "name":"location",
     "type":"location_type",
     "stored":true }
}' http://localhost:8983/solr/gettingstarted/schema

The two other issues can be fixed through a custom update chain (you provide the name of the chain as the update.chain URL parameter when indexing the document).

To automagically assign a guid to any indexed document, you can use the UUIDUpdateProcessorFactory. Give the field name (id) as the fieldName parameter.

To get the latitude and longitude concatenated to a single field with , as the separator, you can use a ConcatFieldUpdateProcessorFactory. The important thing here is that it concatenates a list of values given for a single valued field into a single value - it does not concatenate two different field names. To fix that we can use a CloneFieldUpdateProcessor to move both the latitude and longitude value into a separate field.

<updateRequestProcessorChain name="populate-location">
  <processor class="solr.CloneFieldUpdateProcessorFactory">
    <arr name="source">
      <str>latitude</str>
      <str>longitude</str>
    </arr>
    <str name="dest">location</str>
  </processor>
  <processor class="solr.ConcatFieldUpdateProcessorFactory">
    <str name="delimiter">,</str>
  </processor>
</updateRequestProcessorChain

If you add the location field later and already have the data in your database, this won't work. Solr won't touch data that has already been indexed, and you'll have to reindex to get your information processed and indexed the correct way. This is true regardless of how you get content into the location field.

The old example is probably the other way around - earlier you'd send a latlon pair, and it'd get indexed as two separate values - one for latitude and one for longitude - under the hood. You could probably hack around that by sending a single value for each, but it was really meant to work the other way around - sending one value and getting it indexed as two separate fields. Since the geospatial support in Lucene (and Solr) was just starting out, the already existing types were re-used instead of creating more dedicated types.

Upvotes: 1

Related Questions