user1932914
user1932914

Reputation: 186

Extracting specific fields from Data in Ruby

This is the Ruby program in which I have to extract specific fields using Regular expression from the data in a file. The data in the file is in the following format:

Nov 13 01:46:57 10.232.47.76 qas-adaptiveip-10-232-47-76 2015-11-13 01:46:57 +0000 [info]: qas-296d1fa95fd0ac5a84ea73234c0c48d64f6ea22d has been deregistered adap_tdagt

I need to extract the following values 1)2015-11-13 01:46:57 +0000 2)qas-296d1fa95fd0ac5a84ea73234c0c48d64f6ea22d

I have written the code but it's not working properly. Can someone please help me out with this problem.

  class Task5
  def initialize
  #   @f=File.open('C:/Users/aroraku/Desktop,boc-adap_td-agent.log-2.log',r)
  @count=0
  end

  def check_line(line)
      if(line=~/deregistered adap_tdagt$/)
           line=~ (/.*(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2} +\d{4})/)
               puts "#{$1}"
      end
  end

  def file_read
     open("boc-adap_td-agent.log-2.log") { |f|
          while line=f.gets do
             check_line(line)
          end
     }
    # return @count
  end
end

Upvotes: 3

Views: 233

Answers (2)

Cary Swoveland
Cary Swoveland

Reputation: 110725

str = "Nov 13 01:46:57 10.232.47.76 qas-adaptiveip-10-232-47-76 2015-11-13 01:46:57 +0000 [info]: qas-296d1fa95fd0ac5a84ea73234c0c48d64f6ea22d has been deregistered adap_tdagt"

As the problem with your code has been identified, I would like to suggest another way to extract the desired information from each line:

r = /
    (?:                # begin a non-capture group
      \d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}\s\+\d{4} # match date string
    )                  # end non-capture group
    |                  # or
    (?:                # begin a non-capture group
      (?<=\[info\]:\s) # match "[info:] " in a positive lookbehind
      \S+              # match >= 1 characters other than whitespace
    )                  # end non-capture group
    /x                 # extended/free-spacing regex definition mode

str.scan(r)
  #=> ["2015-11-13 01:46:57 +0000", "qas-296d1fa95fd0ac5a84ea73234c0c48d64f6ea22d"] 

Upvotes: 4

Inpego
Inpego

Reputation: 2667

You must escape + sign for date:

line =~ /.*(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2} \+\d{4}).+([a-z]{3}-[a-f0-9]{40})/
puts $1 # 2015-11-13 01:46:57 +0000
puts $2 # qas-296d1fa95fd0ac5a84ea73234c0c48d64f6ea22d

Upvotes: 3

Related Questions