C.N.N.
C.N.N.

Reputation: 91

Regex matching a multi-line pattern

I have a log file test_list.txt that looks like this:

Processing SampleDocumentController#index (for 101.101.101.101 at 2020-12-12 12:00:00) [POST]
  Session ID: sdfgs923jks0dm23mlasf3da9asfjvyur
  Parameters: {"format"=>"xml", "controller"=>"sample_document", "q"=>"last_updated_at", "action"=>"index"}
Completed in 0.00529 (189 reqs/sec) | Rendering: 0.00007 (1%) | DB: 0.00126 (23%) | 200 OK [https://www.bars.com/sample/sample_document.lmx?]

I have a regex to capture the method and the session id of the log file:

regex = /\[([A-Z]+)\]\D+([a-zA-Z0-9]{32}$)/i

When I run it individually, it works fine and returns the captured string "POST" and "sdfgs923jks0dm23mlasf3da9asfjvyur". However, with the following script test.rb:

File.open("test_list.txt").each do |li|
  if !li.nil?
    x = li.match(regex)
    if !x.nil?
      a, b = x.captures
      p a
      p b
    end
  end
end

Running ruby test.rb in the command line does not print anything.

Any idea why it doesn't work with the script?

Upvotes: 1

Views: 92

Answers (3)

tukan
tukan

Reputation: 17337

Sawa is right. I think you misunderstood the the String#match, which you usually want to use in a boolean context.

You probably want to use scan. In your case you can scan it like this:

string = File.read("test_list.txt")
        p string.scan(/(\[[A-Z]+\])|((?<=Session ID: )[a-zA-Z0-9]{33})/)

which will result in something like this:

[["[POST]", nil], [nil, "sdfgs923jks0dm23mlasf3da9asfjvyur"]]

You can play around with the expression on regular

Upvotes: 1

C.N.N.
C.N.N.

Reputation: 91

I got it to work exactly as I wanted by doing this:

string = File.read("test_list.txt")
regex = /\[([A-Z]+)\]\D+([a-zA-Z0-9]{32}$)/

string.scan(regex).each do|x|
  puts x
end

If say, I only wanted to print a specific capture group, I just add the array index number for x:

puts x[0]

or

puts x[1]

Upvotes: 1

sawa
sawa

Reputation: 168071

It is because your regex matches only when both the method and the session id are present. In your log file, they are located on different lines, and none of the lines includes both. Hence, none of the lines matches the regex.

Upvotes: 3

Related Questions