user1427026
user1427026

Reputation: 879

Elasticsearch not automatically pulling out existing mongoDB data once index has been deleted and recreated

sorry if I've asked a silly question but I can't figure out the solution. I have data stored in mongodb and the collections are mapped to es indices using richardwilly's plugin. However, a couple of my indices are messed up (due to which not all the data that I expect to see is in es (its still in mongodb)). I tried creating a dummy index on dummy data and I expect that after re-indexing I will now see this data in es.

The problem seems to be that the mongo river operates on the oplog and after I delete the index, after inserting the next first new document I want to see the other thousands of documents in mongodb to automatically now be visible in es. However, I only see the documents that I inserted after deleting and recreating the indexes. The other 1000's of documents are still visible in mongo but not in es.

I did a small experiment and I saw that if I actually reinserted the 500 documents, they are then visible in elasticsearch(if the index is right to allow them all in). Can you please tell me how I can make the data in mongodb visible in es after I recreate the index without having to delete and reinsert as I cannot do this. Do I need to replay the oplog or is there another approach that you can suggest such that I can get this data into es without deleting and reinserting?

Thanks!

Upvotes: 1

Views: 3210

Answers (3)

coreyt
coreyt

Reputation: 505

If re-creating the river doesn't work, there are a couple of options.

  1. After you have configured and started your replica set, reload your database with mongodump/mongorestore. Because the river uses the oplog, when you create your river, the data needs to have passed through the oplog if the new river is going to know that the data exists and should be indexed. (This is perhaps easier to do in a development environment.)

  2. Another way that seems possible is to touch all of the objects through the rails console. Again, make sure your replica set is already running:

    $ bundle exec rails c
    1.9.1 :001 > Person.all.each do |person|
    1.9.1 :002 >     person.save()
    1.9.1 :003?>   end
    

Upvotes: 0

user1427026
user1427026

Reputation: 879

Answering my own question, I got helped out by the elasticsearch community. If you delete the river and create a new one, then all the data in the collection you map to should be available in the elasticsearch index.

Upvotes: 0

YannCluchey
YannCluchey

Reputation: 21

The MongoDB river, as you say, works by using Mongo's oplog, which means you can only ever index changes to documents into Elastic. (Changes to Mongo indexes have no bearing on the oplog) in order to index documents created prior to your first oplog entry, you'll need to find another way.

If you don't want to delete+reinsert, you could perform a bulk update on your existing documents.

Alternatively, you could implement a tool which finds the first doc in Elastic, queries Mongo to find any earlier docs and indexes the missing ones.

Upvotes: 2

Related Questions