byc
byc

Reputation: 141

reindexing elastic search does not return all documents

I have about 1.5 million documents in my elastic search. I'm hoping to reindex them so that each index filters documents containing certain keywords, and one (null index) that do not contain any of the keywords I specified in other indices. I'm not sure why my indices returned fewer documents than expected. Particularly I'm expecting about 1.2 million documents in the null index but it only returned about 30k documents in the new index. Would appreciate ideas on what I've done wrong here!

This is how I reindex documents containing certain keywords in multiple fields

curl --location --request POST 'http://abcdef2344:9200/_reindex' \
--header 'Content-Type: application/json' \
--data-raw '{
  "source": {
    "index": "mydocs_email_*",
    "query": {
      "bool": {
        "filter": [
          {
            "bool": {
              "should": [
                {
                  "multi_match": {
                    "fields": [
                      "content",
                      "meta.raw.Message:Raw-Header:Subject"
                    ],
                    "query": "keyword1"
                  }
                },
                {
                  "multi_match": {
                    "fields": [
                      "content",
                      "meta.raw.Message:Raw-Header:Subject"
                    ],
                    "query": "keyword2"
                  }
                }
              ]
            }
          }
        ]
      }
    }
  },
  "dest": {
    "index": "analysis_keywords"
  }
}'

Then I use must_not to create another index that do not contain keyword1 and keyword2.

curl --location --request POST 'http://abcdef2344:9200/_reindex' \
--header 'Content-Type: application/json' \
--data-raw '{
  "source": {
    "index": "mydocs_email_*",
    "query": {
      "bool": {
        "filter": [
          {
            "bool": {
              "must_not": [
                {
                  "multi_match": {
                    "fields": [
                      "content",
                      "meta.raw.Message:Raw-Header:Subject"
                    ],
                    "query": "keyword1"
                  }
                },
                {
                  "multi_match": {
                    "fields": [
                      "content",
                      "meta.raw.Message:Raw-Header:Subject"
                    ],
                    "query": "keyword2"
                  }
                }
              ]
            }
          }
        ]
      }
    }
  },
  "dest": {
    "index": "analysis_null"
  }
}'

The null index returned 29.7k documents. From the error message it looks like I should expect 1.28 million files. It also said I need to increase the number of fields in the index - which I also did after running the codes above. Though the number of files still stay the same.

{"took":53251,"timed_out":false,"total":1277428,"updated":243,"created":29755,"deleted":0,"batches":30,"version_conflicts":0,"noops":0,"retries":{"bulk":0,"search":0},"throttled_millis":0,"requests_per_second":-1.0,"throttled_until_millis":0,"failures":[{"index":"analysis_null","type":"_doc","id":"/email/.......msg","cause":{"type":"illegal_argument_exception","reason":"Limit of total fields [1000] in index [analysis_null] has been exceeded"},"status":400}]

Upvotes: 0

Views: 1231

Answers (1)

Joe - Check out my books
Joe - Check out my books

Reputation: 16943

The error means exactly what it says -- a hard limit in the total number of fields was exceeded during the reindex.

Doesn't changing that setting before reindexing solve the problem?

DELETE analysis_null

PUT analysis_null
{
  "settings": {
    "index.mapping.total_fields.limit": 10000
  }
}

Upvotes: 1

Related Questions