Reputation: 563
I'm reading a log file and trying to organize the data in the below format, so I wanted to push NAME(i.e USOLA51, USOLA10..) as hash and create corresponding array for LIST and DETAILS. I've created the hash too but not sure how to take/extract the corresponding/associated array values.
Expected Output
NAME LIST DETAILS
USOLA51 ICC_ONUS .035400391
PA_ONUS .039800391
PA_ONUS .000610352
USOLA10 PAL 52.7266846
CFG_ONUS 15.9489746
likewise for the other values
Log file:
--- data details ----
USOLA51
ONUS size
------------------------------ ----------
ICC_ONUS .035400391
PA_ONUS .039800391
PE_ONUS .000610352
=========================================
---- data details ----
USOLA10
ONUS size
------------------------------ ----------
PAL 52.7266846
CFG_ONUS 15.9489746
=========================================
---- data details ----
USOLA55
ONUS size
------------------------------ ----------
PA_ONUS 47.4707031
PAL 3.956604
ICC_ONUS .020385742
PE_ONUS .000610352
=========================================
---- data details ----
USOLA56
ONUS size
------------------------------ ----------
=========================================
what I've tried
unique = Array.new
owner = Array.new
db = Array.new
File.read("mydb_size.log").each_line do |line|
next if line =~ /---- data details ----|^ONUS|---|=======/
unique << line.strip if line =~ /^U.*\d/
end
hash = Hash[unique.collect { |item| [item, ""] } ]
puts hash
Current O/p
{"USOLA51"=>"", "USOLA10"=>"", "USOLA55"=>"", "USOLA56"=>""}
Any help to move forward would be really helpful here.Thanks !!
Upvotes: 1
Views: 131
Reputation: 11070
While your log file isn't CSV, I find the csv library useful in a lot of non-csv parsing. You can use it to parse your log file, by skipping blank lines, and any line starting with ---, ===, or ONUS. Your column separator is a white space character:
csv = CSV.read("./example.log", skip_lines: /\A(---|===|ONUS)/,
skip_blanks: true, col_sep: " ")
Then, some lines only have 1 element in the array parsed out, those are your header lines. So we can split the csv
array into groups based on when we only have 1 element, and create a hash from the result:
output_hash = csv.slice_before { |row| row.length == 1 }.
each_with_object({}) do |((name), *rows), hash|
hash[name] = rows.to_h
end
Now, it's a little hard to tell if you wanted the hash output as the text you showed, or if you just wanted the hash. If you want the text output, we'll first need to see how much room each column needs to be displayed:
name_length = output_hash.keys.max_by(&:length).length
list_length = output_hash.values.flat_map(&:keys).max_by(&:length).length
detail_length = output_hash.values.flat_map(&:values).max_by(&:length).length
format = "%-#{name_length}s %-#{list_length}s %-#{detail_length}s"
and then we can output the header row and all the values in output_hash
, but only if they have any values:
puts("#{format}\n\n" % ["NAME", "LIST", "DETAILS"])
output_hash.reject { |name, values| values.empty? }.each do |name, values|
list, detail = values.first
puts(format % [name, list, detail])
values.drop(1).each do |list, detail|
puts(format % ['', list, detail])
end
puts
end
and the result:
NAME LIST DETAILS
USOLA51 ICC_ONUS .035400391
PA_ONUS .039800391
PE_ONUS .000610352
USOLA10 PAL 52.7266846
CFG_ONUS 15.9489746
USOLA55 PA_ONUS 47.4707031
PAL 3.956604
ICC_ONUS .020385742
PE_ONUS .000610352
It's a little hard to explain (for me) what slice_before
does. But, it takes an array (or other enumerable) and creates groups or chunks of its element, where the first element matches the parameter or the block returns true. For instance, if we had a smaller array:
array = ["slice here", 1, 2, "slice here", 3, 4]
array.slice_before { |el| el == "slice here" }.entries
# => [["slice here", 1, 2], ["slice here", 3, 4]]
We told slice_before
, we want each group to begin with the element that equals "slice here", so we have 2 groups returned, the first element in each is "slice here" and the remaining elements are all the elements in the array until the next time it saw "slice here".
So then, we can take that result and we call each_with_object
on it, passing an empty hash to start out with. With each_with_object
, the first parameter is going to be the element of the array (from each) and the second is going to be the object you passed. What happens when the block parameters look like |((name), *rows), hash|
is that first parameter (the element of the array) gets deconstructed into the first element of the array and the remaining elements:
# the array here is what gets passed to `each_with_object` for the first iteration as the first parameter
name, *rows = [["USOLA51"], ["ICC_ONUS", ".035400391"], ["PA_ONUS", ".039800391"], ["PE_ONUS", ".000610352"]]
name # => ["USOLA51"]
rows # => [["ICC_ONUS", ".035400391"], ["PA_ONUS", ".039800391"], ["PE_ONUS", ".000610352"]]
So then, we deconstruct that first element again, just so we don't have it in an array:
name, * = name # the `, *` isn't needed in the block parameters, but is needed when you run these examples in irb
name # => "USOLA51"
For the max_by(&:length).length
, all we're doing is finding the longest element in the array (returned by either keys
or values
) and getting the length of it:
output_hash = {"USOLA51"=>{"ICC_ONUS"=>".035400391", "PA_ONUS"=>".039800391", "PE_ONUS"=>".000610352"}, "USOLA10"=>{"PAL"=>"52.7266846", "CFG_ONUS"=>"15.9489746"}, "USOLA55"=>{"PA_ONUS"=>"47.4707031", "PAL"=>"3.956604", "ICC_ONUS"=>".020385742", "PE_ONUS"=>".000610352"}, "USOLA56"=>{}}
output_hash.values.flat_map(&:keys)
# => ["ICC_ONUS", "PA_ONUS", "PE_ONUS", "PAL", "CFG_ONUS", "PA_ONUS", "PAL", "ICC_ONUS", "PE_ONUS"]
output_hash.values.map(&:length) # => [8, 7, 7, 3, 8, 7, 3, 8, 7]
output_hash.values.flat_map(&:keys).max_by(&:length) # => "ICC_ONUS"
output_hash.values.flat_map(&:keys).max_by(&:length).length # => 8
Upvotes: 2
Reputation: 2455
It's been a long time i've been working with ruby, so probably i forgot a lot of the shortcuts and syntactic sugar, but this file seems to be easily parseable without great efforts.
A simple line-by-line comparison of expected values will be enough. First step is to remove all surrounding whitespaces, ignore blank lines, or lines that start with =
or -
. Next if there is only one value, it is the title, the next line consists of the column names, which can be ignored for your desired output. If either title or column names are encountered, move on to the next line and save the following key/value pairs as ruby key/value pairs. During this operation also check for the longest occurring string and adjust the column padding, so that you can generate the table-like output afterwards with padding.
# Set up the loop
merged = []
current = -1
awaiting_headers = false
columns = ['NAME', 'LIST', 'DETAILS']
# Keep track of the max column length
columns_pad = columns.map { |c| c.length }
str.each_line do |line|
# Remove surrounding whitespaces,
# ignore empty or = - lines
line.strip!
next if line.empty?
next if ['-','='].include? line[0]
# Get the values of this line
parts = line.split ' '
# We're not awaiting the headers and
# there is just one value, must be the title
if not awaiting_headers and parts.size == 1
# If this string is longer than the current maximum
columns_pad[0] = line.length if line.length > columns_pad[0]
# Create a hash for this item
merged[current += 1] = {name: line, data: {}}
# Next must be the headers
awaiting_headers = true
next
end
# Headers encountered
if awaiting_headers
# Just skip it from here
awaiting_headers = false
next
end
# Take 2 parts of each (should be always only those two)
# and treat them as key/value
parts.each_cons(2) do |key, value|
# Make it a ruby key/value pair
merged[current][:data][key] = value
# Check if LIST or DETAILS column length needs to be raised
columns_pad[1] = key.length if key.length > columns_pad[1]
columns_pad[2] = value.length if value.length > columns_pad[2]
end
end
# Adding three spaces between columns
columns_pad.map! { |c| c + 3}
# Writing the headers
result = columns.map.with_index { |c, i| c.ljust(columns_pad[i]) }.join + "\n"
merged.each do |item|
# Remove the next line if you want to include empty data
next if item[:data].empty?
result += "\n"
result += item[:name].ljust(columns_pad[0])
# For the first value in data, we don't need extra padding or a line break
padding = ""
item[:data].each do |key, value|
result += padding
result += key.ljust(columns_pad[1])
result += value.ljust(columns_pad[2])
# Set the padding to include a line break and fill up the NAME column with spaces
padding = "\n" + "".ljust(columns_pad[0])
end
result += "\n"
end
puts result
Which will result in
NAME LIST DETAILS
USOLA51 ICC_ONUS .035400391
PA_ONUS .039800391
PE_ONUS .000610352
USOLA10 PAL 52.7266846
CFG_ONUS 15.9489746
USOLA55 PA_ONUS 47.4707031
PAL 3.956604
ICC_ONUS .020385742
PE_ONUS .000610352
Upvotes: 1