Churro
Churro

Reputation: 4366

Elasticsearch for logging - need architectural advice

I am trying to come up with an optimized architecture to store event logging messages on Elasticsearch.

Here are my specs/needs:

So the most frequently executed query will be: get all LogItems given client_id, customer_id, and timestamp range.

Here is what a LogItem looks like:

"_source": {
    "agent_id" : 14,
    "location_id" : 2,
    "customer_id" : 5289,
    "timestamp" : 1320366520000, //Java Long millis since epoch
    "event_type" : 7,
    "screen_id" : 12
}

I need help indexing my data.

I have been reading what is an elasticsearch index? and using elasticsearch to serve events for customers to get an idea of a good indexing architecture, but I need assistance from the pros.

So here are my questions:

  1. The article suggests creating "One index per day". How would I do range queries with that architecture? (eg: is it possible to query on index range?)

  2. Currently I'm using one big index. If I create one index per location_id, how do I use shards for further organization of my records?

  3. Given the specs above, is there a better architecture you can suggest?

  4. What fields should I filter with vs query with?

EDIT: Here's a sample query run from my app:

{
  "query" : {
    "bool" : {
      "must" : [ {
        "term" : {
          "agent_id" : 6
        }
      }, {
        "range" : {
          "timestamp" : {
            "from" : 1380610800000,
            "to" : 1381301940000,
            "include_lower" : true,
            "include_upper" : true
          }
        }
      }, {
        "terms" : {
          "event_type" : [ 4, 7, 11 ]
        }
      } ]
    }
  },
  "filter" : {
    "term" : {
      "customer_id" : 56241
    }
  }
}

Upvotes: 2

Views: 363

Answers (2)

Jilles van Gurp
Jilles van Gurp

Reputation: 8294

Take a good look at logstash (and kibana). They are all about solving this problem. If you decide to roll your own architecture for this, you might copy some of their design.

Upvotes: 1

javanna
javanna

Reputation: 60205

You can definitely search on multiple indices. You can use wildcards or a comma-separated list of indices for instance, but keep in mind that index names are strings, not dates.

Shards are not for organizing your data but to distribute it and eventually scale out. How you do that is driven by your data and what you do with it. Have a look at this talk: http://vimeo.com/44716955 .

Regarding your question about filters VS queries, have a look at this other question.

Upvotes: 2

Related Questions