Reputation: 23
I need your help. I'm writing a script on ruby, which parses the log file. But I can not write a simple regular expression for such a log. Help me please. Here is an example of a string from the log:
2014-01-09T06:16:53.766841+00:00 heroku[router]: at=info method=POST path=/logs/save_personal_data host=services.pocketplaylab.com fwd="5.13.87.91" dyno=web.10 connect=1ms service=42ms status=200 bytes=16
2014-01-09T06:16:53.772938+00:00 heroku[router]: at=info method=POST path=/api/users/100002844291023 host=services.pocketplaylab.com fwd="46.195.178.244" dyno=web.6 connect=2ms service=43ms status=200 bytes=52
2014-01-09T06:16:53.765430+00:00 heroku[router]: at=info method=GET path=/api/users/100005936523817/get_friends_progress host=services.pocketplaylab.com fwd="5.13.87.91" dyno=web.11 connect=1ms service=47ms status=200 bytes=7498
2014-01-09T06:16:53.760472+00:00 heroku[router]: at=info method=POST path=/api/users/1770684197 host=services.pocketplaylab.com fwd="74.139.217.81" dyno=web.5 connect=1ms service=17ms status=200 bytes=681
2014-01-09T06:15:15.893505+00:00 heroku[router]: at=info method=GET path=/api/users/1686318645/get_friends_progress host=services.pocketplaylab.com fwd="1.125.42.139" dyno=web.3 connect=8ms service=90ms status=200 bytes=7534
2014-01-09T06:16:53.768188+00:00 heroku[router]: at=info method=GET path=/api/users/100005936523817/get_friends_score host=services.pocketplaylab.com fwd="5.13.87.91" dyno=web.13 connect=2ms service=46ms status=200 bytes=9355
2014-01-09T06:15:17.858874+00:00 heroku[router]: at=info method=POST path=/api/users/1145906359 host=services.pocketplaylab.com fwd="107.220.72.53" dyno=web.14 connect=2ms service=362ms status=200 bytes=52
2014-01-09T06:16:53.797975+00:00 heroku[router]: at=info method=GET path=/api/users/100000622081059/count_pending_messages host=services.pocketplaylab.com fwd="174.239.6.42" dyno=web.12 connect=1ms service=20ms status=200 bytes=33
2014-01-09T06:16:53.796869+00:00 heroku[router]: at=info method=GET path=/api/users/100004683190675/get_friends_score host=services.pocketplaylab.com fwd="99.138.1.64" dyno=web.12 connect=2ms service=55ms status=200 bytes=16881
My code(Updating):
#!/usr/bin/env ruby
require 'csv'
sample_logs = File.readlines "/home/railsroger/Playlab_test/sample.log"
file_name = ARGV.last
result_parse = []
CSV.open(file_name, "wb") do |csv_line|
csv_line << ['URL', 'Dyno', 'Connect', 'Service']
sample_logs.each_with_index do |sample_log, idx|
path = sample_log.scan(/path=([^\s]+)/).first.first
dyno = sample_log.scan(/dyno=([^\s]+)/).first.first
connect = sample_log.scan(/connect=([^\s]+)/).first.first
service = sample_log.scan(/service=([^\s]+)/).first.first
result_parse = [path, dyno, connect, service]
csv_line << result_parse
end
end
Thanks.
Upvotes: 2
Views: 3951
Reputation: 4415
Ok, to write your regex, what you need is to finde all these couples of some_variable=some_data
.
Here is how you can do that:
/\S*=\S*/ #
\S* # match any non-whitespace-character, 0-n times
= # match the equal sign
\S* # match any non-whitespace-character, 0-n times
This will match the couples. To extract the data, you use capture groups.
You enclose what you want to extract in brackets (xxx)
, for the name of the variable and the value.
/(\S*)=(\S*)/
(\S*) # capture the name
(\S*) # capture the value
So for every log line you can do:
line_of_log.scan(/(\S*)=(\S*)\s/)
To see what happens, and to create regular expressions, I recommend you always try it out in a tool like https://regex101.com/, really helps to understand what is happening.
This will return an array of arrays like this:
[["at", "info"],
["method", "POST"],
["path", "/api/online/platforms/facebook_canvas/users/100002266342173/add_ticket"],
["host", "services.pocketplaylab.com"],
["fwd", "\"94.66.255.106\""],
["dyno", "web.12"],
["connect", "12ms"],
["service", "21ms"],
["status", "200"],
["bytes", "78"]]
No you can iterate through the array and create some kind of object or hash to work with.
scanresult.inject({}) do |obj, pair|
obj[pair[0].to_sym] = pair[1]
obj
end
Upvotes: 4
Reputation: 6562
I am not a regex expert, and also aware of that below code smells -)) but you can take this as a starting point.
lines = File.readlines 'sample.log'
lines.each_with_index do |line, idx|
path = line.scan(/path=([^\s]+)/).first.first
dyno = line.scan(/dyno=([^\s]+)/).first.first
connect = line.scan(/connect=([^\s]+)/).first.first
service = line.scan(/service=([^\s]+)/).first.first
puts "#{path} #{dyno} #{connect} #{service}"
end
Edit suggested by Wiktor Stribiżew, which obviously is concise and better. I should prefer it over mine. Keeping above code for historical reasons -))
lines.each_with_index do |line, idx|
path = line[/path=([^\s]+)/, 1]
dyno = line[/dyno=([^\s]+)/, 1]
connect = line[/connect=([^\s]+)/, 1]
service = line[/service=([^\s]+)/, 1]
puts "#{path} #{dyno} #{connect} #{service}"
end
Upvotes: 2
Reputation: 2963
The solution is to use named captures:
String#match(/dyno=(?<dyno>\S+)/)
will capture the dyno string. You can expand the regexp to match more.
You can fiddle with the example here: http://rubular.com/r/4XcovTiqh3 - with a bit of trial and error you can find the right regexp
parser = log.match(/dyno=(?<dyno>\S+)/)
will return a MatchData object from which you can get the matched dyno with:
parser['dyno']
Once you finalize your regexp to capture more from each line, and if you use Ruby 2.4 or later, you can also use named_captures
to get a nice hash with all matched groups
See how it works: https://repl.it/repls/SpectacularBewitchedPolygon
Upvotes: 0