Viswesn
Viswesn

Reputation: 4880

Way to send half logs to RabbitMQ before sending the complete log to logstash/elasticsearch

I have a couple of functions and each function is creating logs specific to one transaction; It is a multi-thread application so the function entry to func1 can be random for transaction made but for single transaction it will go via func1, func2 and func3 order only.

func1(transactionId) {
     log("%d Now in func1", transactionId);
}

func2(transactionId) {
     log("%d Now in func2", transactionId);
}

func3(transactionId) {
     log("%d Now in func3", transactionId);
}

Now, I want to write in to logstash at once for each transaction ONLY at a time; that is

 1 Now in func1 Now in func2 Now in fun3

and then this need to go finally to elasticsearch;

I was thinking of writing half transaction log to RabbitMQ temporary queue and then on completion of complete transaction, I will commit it to RabbitMQ producer queue to send the message to logstash;

Like

func1(transactionId) {
     add2RMQ(transactionId, "Now in func1");
}

func2(transactionId) {
     add2RMQ("transactionId, "Now in func2");
}

func3(transactionId) {
      add2RMQ("transactionId, "Now in func3");
      /* Last point of transaction */
      commit2RMQ(transactionId);
}

The time commit2RMQ execute the logstash should receive the complete message specific to the transaction to write to elasticsearch.

Question:

  1. What is the right solution to address this problem to send data specific to a transaction at once to elasticsearch?
  2. Can we address this with RabbitMQ? If so, what are the right API that I need to use for this?
  3. Is there any way that I can achieve the same without RabbitMQ but only with logstash and elasticsearch?
  4. I don't want to use elasticsearch update API as it may consume lots of search operation for every log message specific to a transacation.

Upvotes: 0

Views: 176

Answers (1)

Val
Val

Reputation: 217254

Trying to aggregate different log lines pertaining to a single transaction isn't a simple problem to solve, especially if you add a message queueing system to the mix as an intermediary store of logs to aggregate. I would go a different way which doesn't involve another sub-system like RabbitMQ.

Besides, if you try to concatenate several log lines into a single one, you lose the granular information that each log line can provide, such as the amount of time each function took to execute, for instance. Also what happens if func2, respectively func3, throws an exception? Should you store a partial log made of only func1, respectively only func1 and func2?

What I'm about to write can probably be transposed to any language and any logging solution, but for the purpose of illustration I'm assuming your program is written in Java and you're using Log4J.

So I would leverage Log4J's Mapped Diagnostic Context (MDC) in order to store your transaction ID (and potentially other data such as the user name, etc) in each of your log lines. That way you can easily retrieve all the log lines pertaining to a single transaction. The advantage of doing so is that you don't have to aggregate anything, you just provide enough context information so that Kibana can do it for you later on.

In your pseudo code, you're adding the transaction ID directly to your message. The advantage of using the MDC for this instead of logging the ID into your message, is that it saves you from parsing all your messages in Logstash to re-discover the transaction ID that you already knew when you created the log line.

So the idea is that in your code, as soon as you have a transaction ID, you add it to the current per-thread logging context like this:

import org.apache.log4j.MDC;

...
func1(transactionId) {
     // add the transaction ID to the logging context
     MDC.put("transactionID", transactionId);
     log("Now in func1");
}

func2(transactionId) {
     log("Now in func2");
}

func3(transactionId) {
     log("Now in func3");
}

Then in your Log4J configuration file, you can specify the appender using the %X{transactionID} pattern in order to store it, in this case I'm adding it just after the thread name, but you can put it wherever you like:

log4j.appender.consoleAppender.layout.ConversionPattern = %d [%t] [%X{transactionID}] %5p %c - %m%n

Your logs will look similar to this:

2015-09-28T05:07:28.425Z [http-8084-2] [625562271762]  INFO YourClass - Now in func1
2015-09-28T05:07:29.776Z [http-8084-2] [625562271762]  INFO YourClass - Now in func2
2015-09-28T05:07:30.652Z [http-8084-2] [625562271762]  INFO YourClass - Now in func3
                                              ^
                                              |
                                  the transaction ID is here

When you have log lines like this, it's then a piece of cake to retrieve the transaction ID via a Logstash grok filter and to store it in its own transactionID field in your logstash index. In Kibana, you can then search for a transaction ID and sort by timestamp desc and you'll have all the context of that transaction showing up.

Give it a shot!

Upvotes: 3

Related Questions