Reputation: 51
We are facing some issues with SOLR search.
We are using SOLR 3.1 with Jetty. We have set schema according to our requirement. We have set data-config.xml to import records into the Collection (Core) from our database (Sql Server 2005). There are 320, 000 records in the database which we need to import.
After finished import, when i try to search all the records by SOLR admin
http://localhost:8983/solr/Collection_201/admin/
It shows me total number found 290, 000. So, 30, 000 records are missing.
Now following questions are in my mind
How could i know which record is not properly indexed? OR which record is missing? To know that, i tried a trick, i thought i should have put a field in the database to know that which record is imported into the SOLR collection and which is not. But the big question is how would i update this database field while import from data-config.xml. Because tag allows you only search queries OR in other words something to return. So, i got another idea to still update that database field. I created a stored procedure in my database, which contains update query that would update the field in the database and after that i have select query which is simply return 1 record to fulfill requirement. But when i tried to run DIH with that it returns "Index Failed. Rollback all the changes" error message and nothing imported. When i commented update query into the stored procedure, then it works. So it was not allowing me to run update query even it from stored procedure. So i tried really hard to find a way to update the database from DIH. But i was really failed to find anything Sad smile i refused this idea to update database.
I cleared the index and started import data again. This time i tried it manually run the solr admin import page for 5, 000 records per turn. At the end, for some how records are still missing.
Is this possible it is not committed properly. I red in the documentation that import page (http://localhost:8983/solr/Collection_201/dataimport?command=full-import&clean=false) automatically committed the imported data. But i personally noticed some time it does or sometime it does not. So it is really driving me crazy Sad smile
Now i am fully frustrated and start thinking the way i am using to use SOLR is right or not. If i am right, then is it reliable???? If am wrong, please guide me what is my mistake??
Please Please Please guide me how easily we can sync. collection with our database and make sure it is 100% synced.
Upvotes: 0
Views: 1059
Reputation: 834
Also be aware, that when one record in the batch is wrong, the whole batch gets rolledback. So if it happened 3 times, and you are indexing 10K docs each, that would explain it.
At the time, I solved it: https://github.com/romanchyla/montysolr/blob/master/contrib/invenio/src/java/org/apache/solr/handler/dataimport/NoRollbackDataImporter.java
but there should be a better/more elegant solution than that. I don't know how to get missing records in your case. But if you have indexed the ids, then you can compare the indexed ids with the external source and get the gaps
Upvotes: 0
Reputation: 3671
What field are you using for your IDs in Solr and the database? The id field needs to be unique, so if you have 30,000 records that have the same ID as some 30,000 other records then the data will overwrite those records.
Also, when you run data import handler, you can query it for status (?command=status) and that should tell you the total number of records imported on the last run.
The first thing I would do is check for non-unique IDs in your database WRT the solr id field.
Upvotes: 0