Paritosh Ahuja
Paritosh Ahuja

Reputation: 1249

Apache solr confusion with solr commit and indexing

I have a doubt regarding Apache Solr

  1. I create a core called "sampleCore" added xml data and indexed data using posts.jar
  2. Added a new document using following url :-

curl http://localhost:8983/solr/sampleCore/update?commitWithin=1000 -H "Content-Type: text/xml" --data-binary 'testdoc5'

Still this newly added document is not showing up in solr.

When i reindex the data using posts.jar it shows up.

What is the use of commit then? Do i need to index the data every time i add/remove a document from solr?

Upvotes: 2

Views: 1191

Answers (1)

Eric
Eric

Reputation: 24880

It seems you are right, by default update is not committed for performance reason, thus you can't see it.

This link describes ways to commit document: https://cwiki.apache.org/confluence/display/solr/Near+Real+Time+Searching

There several ways to achieve NRT:

  • add param commit=true, a hard commit, it's costly
  • add param optimize=true, like hard commit, but even more costly, because it put all segments into a single segment.
  • add param commitWithin=<milliseconds>, as used in your question, it's a soft commit, which is less costly, but won't sync to slave in cloud mode.

I tested commitWithin=<milliseconds> via:

curl "http://localhost:8983/solr/dummy/update?wt=json&indent=true&commitWithin=2000" -d '[
{"id":"4", "name":"Bob Smith", "create_date":"2016-02-16T14:36:12Z"}
]'

After 2 seconds, it's searchable. It should work.

In your case you need wait 1 second, and don't query the slave replication, query the master if you are using cloud mode.

I am also not quite familiar with solr, but hopes this could help.


@Update:

Just tried it with xml data, it also works in my solr (5.4.1, on linux), it's searchable within 2 seconds.

refer: https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers

e.g

curl "http://localhost:8983/solr/dummy/update?wt=json&indent=true&commitWithin=2000" -H "Content-Type: text/xml" --data-binary '
<add>
  <doc>
    <field name="id">5</field>
    <field name="name">Jennifer Aniston</field>
    <field name="create_date">2016-02-16T23:10:21Z</field>
  </doc>
</add>'

I didn't do any additional config for xml, wondering did you change default config about NRT.

Upvotes: 2

Related Questions