Reputation: 312
I am running Solr 5.4.1 and Nutch 1.11 I am also using Apache Nifi, and particularly the GetSolr processor.
I understand that the tstamp in my SolrRecord is the time at which the value in the index was fetched.
The challenge I have, is for the GetSolr process to work in NiFi unattended, I need to provide a date field to filter on. If I use tstamp, it will only populate my dataflow the first time, after which the tstamp filter excludes future values, as it is looking at the index time, and not the time that the record was ingested into Solr.
So my question is: how can I include a field in my SolrRecord at the time of bin\nutch index that will include the timestamp of insertion into Solr, not fetching by the crawler.
Upvotes: 0
Views: 341
Reputation: 18640
I think you would have two options...
You could add a new date field in your Solr schema.xml with a default value of NOW:
<field name="timestamp" type="date" indexed="true" stored="true" default="NOW" multiValued="false"/>
You could use the TimestampUpdateProcessorFactory: https://lucene.apache.org/solr/5_4_1/solr-core/org/apache/solr/update/processor/TimestampUpdateProcessorFactory.html
In solrconfig.xml you would add this to an update chain:
<updateRequestProcessorChain name="add-timestamp-field">
<processor class="solr.TimestampUpdateProcessorFactory">
<str name="fieldName">timestamp</str>
</processor>
</updateRequestProcessorChain>
If using the update chain, the add-timestamp-field chain needs to be enabled:
<initParams path="/update/**">
<lst name="defaults">
<str name="update.chain">add-timestamp-field</str>
</lst>
</initParams>
Upvotes: 1