Reputation: 141
I have about 1.5 million documents in my elastic search. I'm hoping to reindex them so that each index filters documents containing certain keywords, and one (null index
) that do not contain any of the keywords I specified in other indices. I'm not sure why my indices returned fewer documents than expected. Particularly I'm expecting about 1.2 million documents in the
null index
but it only returned about 30k documents in the new index. Would appreciate ideas on what I've done wrong here!
This is how I reindex documents containing certain keywords in multiple fields
curl --location --request POST 'http://abcdef2344:9200/_reindex' \
--header 'Content-Type: application/json' \
--data-raw '{
"source": {
"index": "mydocs_email_*",
"query": {
"bool": {
"filter": [
{
"bool": {
"should": [
{
"multi_match": {
"fields": [
"content",
"meta.raw.Message:Raw-Header:Subject"
],
"query": "keyword1"
}
},
{
"multi_match": {
"fields": [
"content",
"meta.raw.Message:Raw-Header:Subject"
],
"query": "keyword2"
}
}
]
}
}
]
}
}
},
"dest": {
"index": "analysis_keywords"
}
}'
Then I use must_not
to create another index that do not contain keyword1
and keyword2
.
curl --location --request POST 'http://abcdef2344:9200/_reindex' \
--header 'Content-Type: application/json' \
--data-raw '{
"source": {
"index": "mydocs_email_*",
"query": {
"bool": {
"filter": [
{
"bool": {
"must_not": [
{
"multi_match": {
"fields": [
"content",
"meta.raw.Message:Raw-Header:Subject"
],
"query": "keyword1"
}
},
{
"multi_match": {
"fields": [
"content",
"meta.raw.Message:Raw-Header:Subject"
],
"query": "keyword2"
}
}
]
}
}
]
}
}
},
"dest": {
"index": "analysis_null"
}
}'
The null index
returned 29.7k documents. From the error message it looks like I should expect 1.28 million files. It also said I need to increase the number of fields in the index - which I also did after running the codes above. Though the number of files still stay the same.
{"took":53251,"timed_out":false,"total":1277428,"updated":243,"created":29755,"deleted":0,"batches":30,"version_conflicts":0,"noops":0,"retries":{"bulk":0,"search":0},"throttled_millis":0,"requests_per_second":-1.0,"throttled_until_millis":0,"failures":[{"index":"analysis_null","type":"_doc","id":"/email/.......msg","cause":{"type":"illegal_argument_exception","reason":"Limit of total fields [1000] in index [analysis_null] has been exceeded"},"status":400}]
Upvotes: 0
Views: 1231
Reputation: 16943
The error means exactly what it says -- a hard limit in the total number of fields was exceeded during the reindex.
Doesn't changing that setting before reindexing solve the problem?
DELETE analysis_null
PUT analysis_null
{
"settings": {
"index.mapping.total_fields.limit": 10000
}
}
Upvotes: 1