Максим
Максим

Reputation: 23

Parse a log file on ruby

I need your help. I'm writing a script on ruby, which parses the log file. But I can not write a simple regular expression for such a log. Help me please. Here is an example of a string from the log:

2014-01-09T06:16:53.766841+00:00 heroku[router]: at=info method=POST path=/logs/save_personal_data host=services.pocketplaylab.com fwd="5.13.87.91" dyno=web.10 connect=1ms service=42ms status=200 bytes=16
2014-01-09T06:16:53.772938+00:00 heroku[router]: at=info method=POST path=/api/users/100002844291023 host=services.pocketplaylab.com fwd="46.195.178.244" dyno=web.6 connect=2ms service=43ms status=200 bytes=52
2014-01-09T06:16:53.765430+00:00 heroku[router]: at=info method=GET path=/api/users/100005936523817/get_friends_progress host=services.pocketplaylab.com fwd="5.13.87.91" dyno=web.11 connect=1ms service=47ms status=200 bytes=7498
2014-01-09T06:16:53.760472+00:00 heroku[router]: at=info method=POST path=/api/users/1770684197 host=services.pocketplaylab.com fwd="74.139.217.81" dyno=web.5 connect=1ms service=17ms status=200 bytes=681
2014-01-09T06:15:15.893505+00:00 heroku[router]: at=info method=GET path=/api/users/1686318645/get_friends_progress host=services.pocketplaylab.com fwd="1.125.42.139" dyno=web.3 connect=8ms service=90ms status=200 bytes=7534
2014-01-09T06:16:53.768188+00:00 heroku[router]: at=info method=GET path=/api/users/100005936523817/get_friends_score host=services.pocketplaylab.com fwd="5.13.87.91" dyno=web.13 connect=2ms service=46ms status=200 bytes=9355
2014-01-09T06:15:17.858874+00:00 heroku[router]: at=info method=POST path=/api/users/1145906359 host=services.pocketplaylab.com fwd="107.220.72.53" dyno=web.14 connect=2ms service=362ms status=200 bytes=52
2014-01-09T06:16:53.797975+00:00 heroku[router]: at=info method=GET path=/api/users/100000622081059/count_pending_messages host=services.pocketplaylab.com fwd="174.239.6.42" dyno=web.12 connect=1ms service=20ms status=200 bytes=33
2014-01-09T06:16:53.796869+00:00 heroku[router]: at=info method=GET path=/api/users/100004683190675/get_friends_score host=services.pocketplaylab.com fwd="99.138.1.64" dyno=web.12 connect=2ms service=55ms status=200 bytes=16881

My code(Updating):

     #!/usr/bin/env ruby
require 'csv'

sample_logs = File.readlines "/home/railsroger/Playlab_test/sample.log"

file_name = ARGV.last
result_parse = []
CSV.open(file_name, "wb") do |csv_line|
  csv_line << ['URL', 'Dyno', 'Connect', 'Service']
  sample_logs.each_with_index do |sample_log, idx|
    path    = sample_log.scan(/path=([^\s]+)/).first.first
    dyno    = sample_log.scan(/dyno=([^\s]+)/).first.first
    connect = sample_log.scan(/connect=([^\s]+)/).first.first
    service = sample_log.scan(/service=([^\s]+)/).first.first


    result_parse = [path, dyno, connect, service]

    csv_line << result_parse    

  end

end

Thanks.

Upvotes: 2

Views: 3951

Answers (3)

Roland Studer
Roland Studer

Reputation: 4415

Ok, to write your regex, what you need is to finde all these couples of some_variable=some_data.

Here is how you can do that:

/\S*=\S*/ #
 \S*      # match any non-whitespace-character, 0-n times
    =     # match the equal sign    
     \S*  # match any non-whitespace-character, 0-n times

This will match the couples. To extract the data, you use capture groups. You enclose what you want to extract in brackets (xxx), for the name of the variable and the value.

/(\S*)=(\S*)/  
 (\S*)         # capture the name
       (\S*)   # capture the value

So for every log line you can do:

line_of_log.scan(/(\S*)=(\S*)\s/)

To see what happens, and to create regular expressions, I recommend you always try it out in a tool like https://regex101.com/, really helps to understand what is happening.

This will return an array of arrays like this:

[["at", "info"],
 ["method", "POST"],
 ["path", "/api/online/platforms/facebook_canvas/users/100002266342173/add_ticket"],
 ["host", "services.pocketplaylab.com"],
 ["fwd", "\"94.66.255.106\""],
 ["dyno", "web.12"],
 ["connect", "12ms"],
 ["service", "21ms"],
 ["status", "200"],
 ["bytes", "78"]]

No you can iterate through the array and create some kind of object or hash to work with.

scanresult.inject({}) do |obj, pair|
  obj[pair[0].to_sym] = pair[1]
  obj
end

Upvotes: 4

marmeladze
marmeladze

Reputation: 6562

I am not a regex expert, and also aware of that below code smells -)) but you can take this as a starting point.

lines = File.readlines 'sample.log'

lines.each_with_index do |line, idx|
  path    = line.scan(/path=([^\s]+)/).first.first
  dyno    = line.scan(/dyno=([^\s]+)/).first.first
  connect = line.scan(/connect=([^\s]+)/).first.first
  service = line.scan(/service=([^\s]+)/).first.first
  puts "#{path} #{dyno} #{connect} #{service}"
end

Link to repl

Edit suggested by Wiktor Stribiżew, which obviously is concise and better. I should prefer it over mine. Keeping above code for historical reasons -))

lines.each_with_index do |line, idx|
  path    = line[/path=([^\s]+)/, 1]
  dyno    = line[/dyno=([^\s]+)/, 1]
  connect = line[/connect=([^\s]+)/, 1]
  service = line[/service=([^\s]+)/, 1]
  puts "#{path} #{dyno} #{connect} #{service}"
end

Upvotes: 2

Teoulas
Teoulas

Reputation: 2963

The solution is to use named captures: String#match(/dyno=(?<dyno>\S+)/) will capture the dyno string. You can expand the regexp to match more.

You can fiddle with the example here: http://rubular.com/r/4XcovTiqh3 - with a bit of trial and error you can find the right regexp

Update after you added at your code:

parser = log.match(/dyno=(?<dyno>\S+)/) 

will return a MatchData object from which you can get the matched dyno with:

parser['dyno']

Once you finalize your regexp to capture more from each line, and if you use Ruby 2.4 or later, you can also use named_captures to get a nice hash with all matched groups

See how it works: https://repl.it/repls/SpectacularBewitchedPolygon

Upvotes: 0

Related Questions