Reputation: 11
I am working in an environment with many different log files and many differently formatted log lines (100+) within each.
I have used grok extensively to reveal all kinds of exciting trends in these but I was wondering if there is a simple generic plot that could give me some insight into the frequency of words in any logfile?
Is it possible in Kibana 4 beta 3 to plot a count of unique words from the @message field? (I'm not interested in any numbers, I work with bandwidths, frequencies which are constantly changing).
Consider the following logfile:
29/01/2015 17:45:00 INFO Loading Banana 3218763kbs Retrieved - null /absy
29/01/2015 17:45:01 DEBUG Apple Interrogation, Completed 25
29/01/2015 17:45:02 EXCEPTION! Fruit rotting in 34 days
29/01/2015 17:45:03 Critical word of the day is pineapple 123456789
Imagine 200 more variations of the above.
I would like to count each word returned by:
cat logfile |cut -d" " -f3- |tr -d [0-9]
i.e. remove the timestamp, delete the numbers and then count the frequency of each word. A pie chart/count of common terms within a logfile I may never have seen before would be extremely useful:
BANANA 788 Help 692 Exception 678 Orange 53 Retrieved 287
I thought an aggregation of "significant terms" on the field "message" would help, but only if I can exclude any numbers, which it doesn't seem possible to do.
Thanks!
Upvotes: 0
Views: 1324
Reputation: 11
Answer was to use "Terms" instead of "Significant terms" with the include pattern [A-Za-z]{2,}* to include only words (not numbers) of 2 letters or more. Cool!
Upvotes: 1