ms_stud
ms_stud

Reputation: 381

Saving logs for long period - elasticsearch

I'm new to elasticsearch.

I have logs in elasticsearch that I need to save more than one month but my elasticsearch configuration allows me to save data in indices only for one month.

So I've searched and found the following solutions:

1.reindex API in elasticsearch that can move logs from one index to another in elasticsearch

POST _reindex?wait_for_completion=false
{
  "source": {
    "index": "my-logstash-2020.07.27",
    "query": { "bool": { "must": [], "filter": [ { "match_all": {} }, { "range": { "@timestamp": { "gte": "2020-07-27T10:45:33.178Z", "lte": "2020-07-27T11:00:33.178Z", "format": "strict_date_optional_time" }}}]}}
  },
  "dest": {
    "index": "my-dump-2020.07.27"
  }
}

This will return me a task that I can control to see if the logs were re-indexed to new index with name my-dump-2020.07.27 (after checking I saw that this will take ~5 min for index with >=100K logs.
Now, from what I understood this index also will be available only for one month therefore, I will need to preform this operation again and again until I no longer need those logs and I can delete them (manually or automatically by using watcher etc.).
2. Take snapshot of this index (or from filtered index after re-indexing).

PUT /_snapshot/my_backup/my-snapshot-2020.07.27?wait_for_completion=false
{
  "indices": "my-logstash-2020.07.27",
  "ignore_unavailable": true,
  "include_global_state": false,
  "metadata": {
    "taken_by": "me",
    "taken_because": "I need it"
  }
}

I understood that snapshot has no ability to filter by some query the data so I need to combine Reindex and snapshot together to do snapshot for filtered data.
I didn't check how much time it wil take but from what I've read it will take time and will affect my elasticsearch performance.
3. Download those logs from elasticsearch to my local machine (sort of dump) but I understood that in case number of logs will be >=100K it will take a lot of time and might fail.

My question is: Is there some mechanism built in in elasticsearch that give me the ability to store some "filtered" logs data for a long period (3 months for example) without affecting it performance (or affecting as little as possible)?

Upvotes: 0

Views: 1290

Answers (1)

ms_stud
ms_stud

Reputation: 381

Solution:

After deeper investigation I've found that there is "Index Life Cycle Management (ILM)" in ElasticSearch which provides policy to created index.

How it works: ILM can be attached to index template and apply policy to indices which names matches one of the patterns from index_pattern list found inside this index template.

This policy is combined from five index life cycles:

Hot: The index is actively being updated and queried.
Warm: The index is no longer being updated but is still being queried.
Cold: The index is no longer being updated and is queried infrequently. The information still needs to be searchable, but it’s okay if those queries are slower.
Frozen: The index is no longer being updated and is queried rarely. The information still needs to be searchable, but it’s okay if those queries are extremely slow.
Delete: The index is no longer needed and can safely be removed.

(Some of the life cycles are mandatory and some are not)

The "Delete" life cycle can be configured so logs will be removed after custom period time (longer then 30 days).

Therefore,
Combining first solution with ILM can provide me ability to save indices for long period.

Upvotes: 1

Related Questions