Reputation: 7535
I have an server API that has a few application instances and a worker instance. Currently, the applications send some data to Loggly (a SAAS centralized logging service). This was good to get started, but i'm starting to look into creating a setup using some open source software.
Besides, the current costs of using Loggly my, biggest concern is that: connecting to Loggly at the end of requests, to log the data, is adding time to the requests.
I've been reading a bit about Logstash, Graphite, ElasticSearch, etc, in conjunction with LogRotate, and some sources seem to suggest writing to a local file on each server, and then when LogRotating, sending them off to Logstash
I'm curious what practices people find most efficient in centralized logging scenarios. Should I be writing to local files on each server first? Or is that making each box to "stateful" and instead, should i just be sending the data directly off the Logstash, or SQS, for processing by a centralized server?
Upvotes: 0
Views: 510
Reputation: 1303
When it comes to centralized logging scenarios, there are implementation differences between tightly coupling your log-producers to logstash and doing so loosely. For very large scale, tight-coupling should be avoided towards the middle. Tight coupling is creating a socket between your producer and your receiver to transmit events over, which can produce lag on the producer side if the receiver is slow.
Loose coupling can come in a variety of methods:
The very large centralized logging systems I know all use some form of queue mediation in the centralization tier.
That said, at the edges the use-cases are different. If you need to avoid writing to files in order to reduce I/O, using TCP or UDP sockets to transmit to a locally installed logstash (that then ships the events to a central queue) can be quite fast.
Centralized logging with logstash can take many forms. If you can install logstash on your log-producing nodes, here is one architecture that is quite valid:
In this architecture, all of the filtering logic is housed in the parser-logstashes, leaving instance-logstash to be nothing but a shipper. The best part is that the parser-logstash tier can be scaled up and down as load warrants. This keeps the instance-logstash with a minimal memory footprint, so it won't compete with the application for resources.
Since Logstash had a loggly plugin, you can still feed data there if you wish while also keeping a copy locally.
Deciding between these two is best done by answering a few questions:
Files are a method of loose-coupling on the instance. If your answer to the first question is, the app pauses until the log-receiver is back, you may not want that sort of tight coupling. In that case, log files are a way to provide a buffer. A buffer that will survive instance restarts, if that's important for you.
It is keeping state on the instance. However, it should be very short-lived state. The log shipper should dump state to the central queue system fast enough that you're not keeping more than a few seconds.
If you are very sensitive to storage I/O and are also very sensitive to TCP state, you can still queue-mediate to a point. Install a local redis instance, have your app ship to that, and have logstash pull from there and ship to the central queue. This allows the app to be buffered from queue-events centrally. Though, in some cases it's still better to ship directly to the central queue if the app can be configured to do so.
Upvotes: 2