Reputation: 381
I'm new to elasticsearch.
I have logs in elasticsearch
that I need to save more than one month but my elasticsearch configuration allows me to save data in indices only for one month.
So I've searched and found the following solutions:
1.reindex
API in elasticsearch
that can move logs from one index to another in elasticsearch
POST _reindex?wait_for_completion=false
{
"source": {
"index": "my-logstash-2020.07.27",
"query": { "bool": { "must": [], "filter": [ { "match_all": {} }, { "range": { "@timestamp": { "gte": "2020-07-27T10:45:33.178Z", "lte": "2020-07-27T11:00:33.178Z", "format": "strict_date_optional_time" }}}]}}
},
"dest": {
"index": "my-dump-2020.07.27"
}
}
This will return me a task that I can control to see if the logs were re-indexed to new index with name my-dump-2020.07.27
(after checking I saw that this will take ~5 min for index with >=100K
logs.
Now, from what I understood this index also will be available only for one month therefore, I will need to preform this operation again and again until I no longer need those logs and I can delete them (manually or automatically by using watcher etc.).
2. Take snapshot of this index (or from filtered index after re-indexing).
PUT /_snapshot/my_backup/my-snapshot-2020.07.27?wait_for_completion=false
{
"indices": "my-logstash-2020.07.27",
"ignore_unavailable": true,
"include_global_state": false,
"metadata": {
"taken_by": "me",
"taken_because": "I need it"
}
}
I understood that snapshot
has no ability to filter by some query the data so I need to combine Reindex
and snapshot
together to do snapshot for filtered data.
I didn't check how much time it wil take but from what I've read it will take time and will affect my elasticsearch
performance.
3. Download those logs from elasticsearch
to my local machine (sort of dump) but I understood that in case number of logs will be >=100K
it will take a lot of time and might fail.
My question is:
Is there some mechanism built in in elasticsearch
that give me the ability to store some "filtered" logs data for a long period (3 months for example) without affecting it performance (or affecting as little as possible)?
Upvotes: 0
Views: 1290
Reputation: 381
Solution:
After deeper investigation I've found that there is "Index Life Cycle Management (ILM)" in ElasticSearch which provides policy to created index.
How it works: ILM can be attached to index template and apply policy to indices which names matches one of the patterns from index_pattern
list found inside this index template.
This policy is combined from five index life cycles:
Hot: The index is actively being updated and queried.
Warm: The index is no longer being updated but is still being queried.
Cold: The index is no longer being updated and is queried infrequently. The information still needs to be searchable, but it’s okay if those queries are slower.
Frozen: The index is no longer being updated and is queried rarely. The information still needs to be searchable, but it’s okay if those queries are extremely slow.
Delete: The index is no longer needed and can safely be removed.
(Some of the life cycles are mandatory and some are not)
The "Delete" life cycle can be configured so logs will be removed after custom period time (longer then 30 days).
Therefore,
Combining first solution with ILM can provide me ability to save indices for long period.
Upvotes: 1