Reputation: 4366
I am trying to come up with an optimized architecture to store event logging messages on Elasticsearch.
Here are my specs/needs:
timestamp
range queries.agent
and customer
interactions (in addition to other fields).customers
and agents
belong to the same location
.So the most frequently executed query will be: get all LogItem
s given client_id
, customer_id
, and timestamp
range.
Here is what a LogItem
looks like:
"_source": {
"agent_id" : 14,
"location_id" : 2,
"customer_id" : 5289,
"timestamp" : 1320366520000, //Java Long millis since epoch
"event_type" : 7,
"screen_id" : 12
}
I need help indexing my data.
I have been reading what is an elasticsearch index? and using elasticsearch to serve events for customers to get an idea of a good indexing architecture, but I need assistance from the pros.
So here are my questions:
The article suggests creating "One index per day". How would I do range queries with that architecture? (eg: is it possible to query on index range?)
Currently I'm using one big index. If I create one index per location_id, how do I use shards for further organization of my records?
Given the specs above, is there a better architecture you can suggest?
What fields should I filter with vs query with?
EDIT: Here's a sample query run from my app:
{
"query" : {
"bool" : {
"must" : [ {
"term" : {
"agent_id" : 6
}
}, {
"range" : {
"timestamp" : {
"from" : 1380610800000,
"to" : 1381301940000,
"include_lower" : true,
"include_upper" : true
}
}
}, {
"terms" : {
"event_type" : [ 4, 7, 11 ]
}
} ]
}
},
"filter" : {
"term" : {
"customer_id" : 56241
}
}
}
Upvotes: 2
Views: 363
Reputation: 8294
Take a good look at logstash (and kibana). They are all about solving this problem. If you decide to roll your own architecture for this, you might copy some of their design.
Upvotes: 1
Reputation: 60205
You can definitely search on multiple indices. You can use wildcards or a comma-separated list of indices for instance, but keep in mind that index names are strings, not dates.
Shards are not for organizing your data but to distribute it and eventually scale out. How you do that is driven by your data and what you do with it. Have a look at this talk: http://vimeo.com/44716955 .
Regarding your question about filters VS queries, have a look at this other question.
Upvotes: 2