karthikeayan
karthikeayan

Reputation: 5000

Ruby find word in text file and count for each title?

I have the below string in a single file. All three are in same file. It may go upto HEAD-N.

From the below string I want a report like

for HEAD-1 4 not started

for HEAD-2 2 started

for HEAD-3 1 started, 2 not started

HEAD-1
========
NE      Server
ASDF    192.168.1.1     not started
ASDF1   192.168.1.1     not started
ASDF2   192.168.1.1     not started
ASDF3   192.168.1.1     not started

HEAD-2
========
NE      Server
ASDF    192.168.1.1     started
ASDF1   192.168.1.1     started

HEAD-3
========
NE      Server
ASDF    192.168.1.1     not started
ASDF1   192.168.1.1     started
ASDF3   192.168.1.1     not started

I just tried with RegExp in Ruby, by getting all the HEAD to one array, then all the NE items into another 2-D array.

(.*\n{1})(==*\s+)(.\s+)

This is matching only upto NE Server, I want regex to match multiline.

I maybe wrong with the regex approach, then I have to try with different approach.

Thanks in advance.

Upvotes: 0

Views: 379

Answers (4)

Arup Rakshit
Arup Rakshit

Reputation: 118271

Here is my different try with CSV :

require 'csv' 

csv_string = <<_
HEAD-1
========
NE      Server
ASDF    192.168.1.1     not started
ASDF1   192.168.1.1     not started
ASDF2   192.168.1.1     not started
ASDF3   192.168.1.1     not started

HEAD-2
========
NE      Server
ASDF    192.168.1.1     started
ASDF1   192.168.1.1     started

HEAD-3
========
NE      Server
ASDF    192.168.1.1     not started
ASDF1   192.168.1.1     started
ASDF3   192.168.1.1     not started
_

options = {:col_sep => " " ,:skip_blanks => true ,:skip_lines => /[=]+/ }

csv_array = CSV.parse(csv_string,options)

csv_array.slice_before { |a| a.first[/head-\d+/i] }.to_a
# => [[["HEAD-1"],
#      ["NE", "Server"],
#      ["ASDF", "192.168.1.1", "not", "started"],
#      ["ASDF1", "192.168.1.1", "not", "started"],
#      ["ASDF2", "192.168.1.1", "not", "started"],
#      ["ASDF3", "192.168.1.1", "not", "started"]],
#     [["HEAD-2"],
#      ["NE", "Server"],
#      ["ASDF", "192.168.1.1", "started"],
#      ["ASDF1", "192.168.1.1", "started"]],
#     [["HEAD-3"],
#      ["NE", "Server"],
#      ["ASDF", "192.168.1.1", "not", "started"],
#      ["ASDF1", "192.168.1.1", "started"],
#      ["ASDF3", "192.168.1.1", "not", "started"]]]
report = csv_array.slice_before { |a| a.first[/head-\d+/i] }.map do|inner_ary|
  key,_ = inner_ary.shift(2)
  not_started,started = inner_ary.partition { |a| a.join(" ")[/\s+not\s+started$/] }
  key.push(["started #{started.size}","not started #{not_started.size}"])
end
Hash[report]
# => {"HEAD-1"=>["started 0", "not started 4"],
#     "HEAD-2"=>["started 2", "not started 0"],
#     "HEAD-3"=>["started 1", "not started 2"]}

Upvotes: 0

hlh
hlh

Reputation: 2072

You can try breaking down the problem into smaller parts. Like, instead of using a complicated regex to match the whole output, you can split the string into separate "HEAD"s, then loop through each HEAD and count how many times the substrings "started" or "not started" occurs. Here's an untested, rough example of what I mean:

str = "<your large string here>"
heads = str.split(/HEAD-\d/)
heads.each_with_index do |current_head, i|
  started_count = current_head.scan(/\s\s+started/).length
  not_started_count = current_head.scan(/not started/).length
  puts "For HEAD #{i + 1}: #{started_count} started, #{not_started_count} not started"
end

Upvotes: 0

Eric Dand
Eric Dand

Reputation: 1210

If you can assume that the input will be formatted like your example (ie one server on each line, "HEAD" title on its own line, etc.) you can use gets to get the input one line at a time and then just match each one to a regular expression like ^(\w+) (\d+\.\d+\.\d+\.\d+) (.+). In the case of this regex, you would just check if the last group was "not started". If so, add one to your count of not started servers. If not, add one to your count of started servers. If the regex didn't match, check if it matches ^HEAD-(\d+) or something similar.

Upvotes: 0

Dani&#235;l Knippers
Dani&#235;l Knippers

Reputation: 3055

Using regular expressions, string holds the whole string. The regular expression should be improved for production, e.g., to only search for started / not started in the right positions and not in the whole string (including server names etc.)

status = {}
string.scan(/^(HEAD-\d+)(.*?)(?:\n\n|\Z)/m).each do |match|
  name, text = match
  started = text.scan(/(?<!not )started/).size
  not_started = text.scan(/not started/).size
  status[name] = {
    started: started,
    not_started: not_started
  }
end

status
# => {"HEAD-1"=>{:started=>0, :not_started=>4}, "HEAD-2"=>{:started=>2, :not_started=>0}, "HEAD-3"=>{:started=>1, :not_started=>2}}

Upvotes: 1

Related Questions