Drake Guan
Drake Guan

Reputation: 15152

How to do Log Mining?

In order to figure out (or guess) something from one of our proprietary desktop tools developed by wxPython, I injected a logging decorator on several regardful class methods. Each log record looks like the following:

log records of one application in phpmyadmin way

Right now, there are more than 3M log records in database and I started to think "What I can get from those stuff?". I can get some information like:

I guess the related technique might be log mining. Does anyone have any idea for further information I can retrieve from this really simple log? I'm really interested to get something more from it.

Upvotes: 2

Views: 1461

Answers (2)

lebowski
lebowski

Reputation: 316

Patterns!

Patterns preceding failures. Say a failure was logged, now consider exploring these questions:

  • What was the sequence of klass-method combos that preceded it?
  • What about other combos?
  • Is it always the same sequence that precedes the same failures?
  • Does a sequence of minor failures precede a major failure?
  • etc

One way to compare patterns can be as such:

  1. Classify each message
  2. Represent each class/type with a unique ID, so you now have a sequence of IDs
  3. Slice the sequence into time periods to compare
  4. Compare the slices (arrays of IDs) with a diff algorithm
  5. Retain samples of periods to establish the common patterns, then compare new samples for the same periods to establish a degree of anomaly

Upvotes: 1

samplebias
samplebias

Reputation: 37919

SpliFF is right, you'll have to decide which questions are important to you and then figure out if you're collecting the right data to answer them. Making sense of this sort of operational data can be very valuable.

You probably want to start by seeing if you can answer some basic questions, and then move on to the tougher stuff once you have your log collection and analysis workflow established. Some longer-term questions you might consider:

  • What are the most common, severe bugs being encountered "in the wild", ranked by frequency and impact. Data: Capture stacktraces / callpoints and method arguments if possible.
  • Can you simplify some of the common actions your users perform? If X is the most common, can the number of steps be reduced or can individual steps be simplified? Data: Sessions, clickstreams for the common workflows. Features ranked by frequency of use, number and complexity of steps.
  • Some features may be confusing, have conflicting options, which lead to user mistakes. Sessions where the user backs up several times to repeat a step, or starts over from the beginning, may be telling.

You may also want to notify users that data is being collected for quality purposes, and even solicit some feedback from within the app's interface.

Upvotes: 1

Related Questions