voltas
voltas

Reputation: 563

Ruby data formatting

I'm reading a log file and trying to organize the data in the below format, so I wanted to push NAME(i.e USOLA51, USOLA10..) as hash and create corresponding array for LIST and DETAILS. I've created the hash too but not sure how to take/extract the corresponding/associated array values.

Expected Output

NAME           LIST             DETAILS

USOLA51        ICC_ONUS         .035400391
               PA_ONUS          .039800391
               PA_ONUS          .000610352

USOLA10        PAL               52.7266846
              CFG_ONUS           15.9489746
likewise for the other values

Log file:

--- data details ----

USOLA51

ONUS                    size
------------------------------ ----------
ICC_ONUS               .035400391
PA_ONUS            .039800391
PE_ONUS            .000610352

=========================================


---- data details ----


USOLA10


ONUS                    size
------------------------------ ----------
PAL                52.7266846
CFG_ONUS               15.9489746


=========================================

---- data details ----


USOLA55


ONUS                    size
------------------------------ ----------
PA_ONUS            47.4707031
PAL              3.956604
ICC_ONUS               .020385742
PE_ONUS            .000610352


=========================================


---- data details ----

USOLA56

ONUS                    size
------------------------------ ----------

=========================================

what I've tried

unique = Array.new
owner = Array.new
db = Array.new
File.read("mydb_size.log").each_line do |line|
  next if line =~ /---- data details ----|^ONUS|---|=======/   
  unique << line.strip if line =~ /^U.*\d/ 

end

hash = Hash[unique.collect { |item| [item, ""] } ]

puts hash

Current O/p

{"USOLA51"=>"", "USOLA10"=>"", "USOLA55"=>"", "USOLA56"=>""}

Any help to move forward would be really helpful here.Thanks !!

Upvotes: 1

Views: 131

Answers (2)

Simple Lime
Simple Lime

Reputation: 11070

While your log file isn't CSV, I find the csv library useful in a lot of non-csv parsing. You can use it to parse your log file, by skipping blank lines, and any line starting with ---, ===, or ONUS. Your column separator is a white space character:

csv = CSV.read("./example.log", skip_lines: /\A(---|===|ONUS)/,
               skip_blanks: true, col_sep: " ")

Then, some lines only have 1 element in the array parsed out, those are your header lines. So we can split the csv array into groups based on when we only have 1 element, and create a hash from the result:

output_hash = csv.slice_before { |row| row.length == 1 }.
  each_with_object({}) do |((name), *rows), hash|
  hash[name] = rows.to_h
end

Now, it's a little hard to tell if you wanted the hash output as the text you showed, or if you just wanted the hash. If you want the text output, we'll first need to see how much room each column needs to be displayed:

name_length = output_hash.keys.max_by(&:length).length
list_length = output_hash.values.flat_map(&:keys).max_by(&:length).length
detail_length = output_hash.values.flat_map(&:values).max_by(&:length).length

format = "%-#{name_length}s %-#{list_length}s %-#{detail_length}s"

and then we can output the header row and all the values in output_hash, but only if they have any values:

puts("#{format}\n\n" % ["NAME", "LIST", "DETAILS"])

output_hash.reject { |name, values| values.empty? }.each do |name, values|
  list, detail = values.first
  puts(format % [name, list, detail])

  values.drop(1).each do |list, detail|
    puts(format % ['', list, detail])
  end

  puts
end

and the result:

NAME    LIST     DETAILS   

USOLA51 ICC_ONUS .035400391
        PA_ONUS  .039800391
        PE_ONUS  .000610352

USOLA10 PAL      52.7266846
        CFG_ONUS 15.9489746

USOLA55 PA_ONUS  47.4707031
        PAL      3.956604  
        ICC_ONUS .020385742
        PE_ONUS  .000610352

It's a little hard to explain (for me) what slice_before does. But, it takes an array (or other enumerable) and creates groups or chunks of its element, where the first element matches the parameter or the block returns true. For instance, if we had a smaller array:

array = ["slice here", 1, 2, "slice here", 3, 4]
array.slice_before { |el| el == "slice here" }.entries
# => [["slice here", 1, 2], ["slice here", 3, 4]]

We told slice_before, we want each group to begin with the element that equals "slice here", so we have 2 groups returned, the first element in each is "slice here" and the remaining elements are all the elements in the array until the next time it saw "slice here".

So then, we can take that result and we call each_with_object on it, passing an empty hash to start out with. With each_with_object, the first parameter is going to be the element of the array (from each) and the second is going to be the object you passed. What happens when the block parameters look like |((name), *rows), hash| is that first parameter (the element of the array) gets deconstructed into the first element of the array and the remaining elements:

# the array here is what gets passed to `each_with_object` for the first iteration as the first parameter
name, *rows = [["USOLA51"], ["ICC_ONUS", ".035400391"], ["PA_ONUS", ".039800391"], ["PE_ONUS", ".000610352"]]
name # => ["USOLA51"]
rows # => [["ICC_ONUS", ".035400391"], ["PA_ONUS", ".039800391"], ["PE_ONUS", ".000610352"]]

So then, we deconstruct that first element again, just so we don't have it in an array:

name, * = name # the `, *` isn't needed in the block parameters, but is needed when you run these examples in irb
name # => "USOLA51"

For the max_by(&:length).length, all we're doing is finding the longest element in the array (returned by either keys or values) and getting the length of it:

output_hash = {"USOLA51"=>{"ICC_ONUS"=>".035400391", "PA_ONUS"=>".039800391", "PE_ONUS"=>".000610352"}, "USOLA10"=>{"PAL"=>"52.7266846", "CFG_ONUS"=>"15.9489746"}, "USOLA55"=>{"PA_ONUS"=>"47.4707031", "PAL"=>"3.956604", "ICC_ONUS"=>".020385742", "PE_ONUS"=>".000610352"}, "USOLA56"=>{}}
output_hash.values.flat_map(&:keys)
# => ["ICC_ONUS", "PA_ONUS", "PE_ONUS", "PAL", "CFG_ONUS", "PA_ONUS", "PAL", "ICC_ONUS", "PE_ONUS"]
output_hash.values.map(&:length) # => [8, 7, 7, 3, 8, 7, 3, 8, 7]
output_hash.values.flat_map(&:keys).max_by(&:length) # => "ICC_ONUS"
output_hash.values.flat_map(&:keys).max_by(&:length).length # => 8

Upvotes: 2

wiesion
wiesion

Reputation: 2455

It's been a long time i've been working with ruby, so probably i forgot a lot of the shortcuts and syntactic sugar, but this file seems to be easily parseable without great efforts.

A simple line-by-line comparison of expected values will be enough. First step is to remove all surrounding whitespaces, ignore blank lines, or lines that start with = or -. Next if there is only one value, it is the title, the next line consists of the column names, which can be ignored for your desired output. If either title or column names are encountered, move on to the next line and save the following key/value pairs as ruby key/value pairs. During this operation also check for the longest occurring string and adjust the column padding, so that you can generate the table-like output afterwards with padding.

# Set up the loop
merged = []
current = -1
awaiting_headers = false
columns = ['NAME', 'LIST', 'DETAILS']
# Keep track of the max column length
columns_pad = columns.map { |c| c.length }

str.each_line do |line|
  # Remove surrounding whitespaces, 
  # ignore empty or = - lines
  line.strip!
  next if line.empty?
  next if ['-','='].include? line[0]
  # Get the values of this line
  parts = line.split ' '
  # We're not awaiting the headers and 
  # there is just one value, must be the title
  if not awaiting_headers and parts.size == 1
    # If this string is longer than the current maximum
    columns_pad[0] = line.length if line.length > columns_pad[0]
    # Create a hash for this item
    merged[current += 1] = {name: line, data: {}}
    # Next must be the headers
    awaiting_headers = true
    next
  end
  # Headers encountered
  if awaiting_headers
    # Just skip it from here
    awaiting_headers = false
    next
  end
  # Take 2 parts of each (should be always only those two) 
  # and treat them as key/value
  parts.each_cons(2) do |key, value|
    # Make it a ruby key/value pair
    merged[current][:data][key] = value 
    # Check if LIST or DETAILS column length needs to be raised
    columns_pad[1] = key.length if key.length > columns_pad[1]
    columns_pad[2] = value.length if value.length > columns_pad[2]
  end
end

# Adding three spaces between columns
columns_pad.map! { |c| c + 3}  

# Writing the headers
result = columns.map.with_index { |c, i| c.ljust(columns_pad[i]) }.join + "\n"

merged.each do |item|
  # Remove the next line if you want to include empty data
  next if item[:data].empty?  
  result += "\n"
  result += item[:name].ljust(columns_pad[0])
  # For the first value in data, we don't need extra padding or a line break
  padding = ""
  item[:data].each do |key, value|
    result += padding
    result += key.ljust(columns_pad[1])
    result += value.ljust(columns_pad[2])
    # Set the padding to include a line break and fill up the NAME column with spaces
    padding = "\n" + "".ljust(columns_pad[0])
  end
  result += "\n"
end

puts result

Which will result in

NAME      LIST       DETAILS      

USOLA51   ICC_ONUS   .035400391   
          PA_ONUS    .039800391   
          PE_ONUS    .000610352   

USOLA10   PAL        52.7266846   
          CFG_ONUS   15.9489746   

USOLA55   PA_ONUS    47.4707031   
          PAL        3.956604     
          ICC_ONUS   .020385742   
          PE_ONUS    .000610352   

Online demo here

Upvotes: 1

Related Questions