user2759821
user2759821

Reputation: 11

Elasticsearch not indexing all documents

I am trying to index all my files stored in MongoDB using Elasticsearch. But only 180842 files are indexed whereas I have 1637870 files in my DB. Any idea why not all documents are indexed?

I checked Elasticsearch log files and there is no error. But I found the below lines in my log file.

(1) [2013-09-11 02:20:57,539][INFO ][river.mongodb            ] [Arsenic] [mongodb][mongodb] Add attachment: 522bef23649dd3bb06a61fd8
(2) [2013-09-11 02:20:57,539][INFO ][org.elasticsearch.river.mongodb.MongoDBRiver$Indexer] Add Attachment: 522bef0fe819cc4b70875a48 to index mongoindex / type files
(3) [2013-09-11 02:20:57,539][INFO ][river.mongodb            ] [Arsenic] [mongodb][mongodb] Caught file: 522bef230eb5b705cf8ccd91 - /data/Test.java

Line (2) means that the file is added to my index. Am not sure what Line (1) and (3) means. Does that mean that those files are not added to index?

NOTE I used the below code to create index:

curl -XPUT 'http://localhost:9200/_river/mongodb/_meta' -d '{
  "type": "mongodb",
  "mongodb": {
    "db": "submission_data",
    "collection": "fs",
    "gridfs": true
  },
  "index": {
    "name": "mongoindex",
    "type": "files"
  }
}'

Upvotes: 1

Views: 2455

Answers (2)

user3202550
user3202550

Reputation:

Actually its because ur oplog size is small. its a capped collection.if you increase your oplog size then it ll work..!

Upvotes: 1

Christos Papoulas
Christos Papoulas

Reputation: 2568

I had the same problem.

If you have a lot of collections try to index one at a time and then restart elasticsearch. Disable the gridfs and have a look on how I made the indexes on mongodb:

curl -XPUT "localhost:9200/_river/mongosearch/_meta" -d '
{
  "type": "mongodb",
  "mongodb": {
    "servers":
    [
      { "host": "localhost", "port": 27017 }
    ],
    "options": { "secondary_read_preference": false },
    "db": "mydbname",
    "collection": "users",
    "gridfs": false
    },
    "index": {
      "name": "mongosearch",
      "type":   "users"  }
}'

EDIT: The above script does the following:

  1. Tell that the index is named mongosearch.

  2. Define the server that runs at localhost in port 27017.

  3. Dont river shards (secondary read preference).

  4. The name of the database is nydbname.

  5. The collection that will go to index is the users collection. '

  6. gridfs is something about storing documents.

  7. Finally we define the index name again and tell the type of collection that we want to index.

Upvotes: 0

Related Questions