sparkle
sparkle

Reputation: 7400

Ruby replace hash key using a Regex

I am parsing an Excel file using Creek. This is the first row (the header):

{"A"=>"Date", "B"=>"Portfolio", "C"=>"Currency"}

and all the other rows are:

[ 
 {"A"=>2019-05-16 00:00:00 +0200, "B"=>"TEXT", "C"=>"INR"}, 
 {"A"=>2019-05-20 00:00:00 +0200, "B"=>"TEXT2", "C"=>"EUR"}
]

My goal is to have the same array, where all hash keys are replaced with key of mapping using a regex expression in the values of the mapping hash.

For example, in the header, the keys match these REGEX:

mapping = {
    date: /Date|Data|datum|Fecha/,
    portfolio_name: /Portfolio|portafoglio|Portfolioname|cartera|portefeuille/,
    currency: /Currency|Valuta|Währung|Divisa|Devise/
    }

So I need all data rows to be replaced like this:

[ 
  {"date"=>2019-05-16 00:00:00 +0200, "portfolio_name"=>"TEXT", "currency"=>"INR"}, 
  {"date=>2019-05-20 00:00:00 +0200, "portfolio_name"=>"TEXT2", "currency"=>"EUR"}
]

Upvotes: 1

Views: 283

Answers (2)

Jay-Ar Polidario
Jay-Ar Polidario

Reputation: 6603

TL:DR;

require 'time'

mappings = {
  date: /Date|Data|datum|Fecha/,
  portfolio_name: /Portfolio|portafoglio|Portfolioname|cartera|portefeuille/,
  currency: /Currency|Valuta|Währung|Divisa|Devise/
}

rows = [
  {"A"=>"Date", "B"=>"Portfolio", "C"=>"Currency"},
  {"A"=>Time.parse('2019-05-16 00:00:00 +0200'), "B"=>"TEXT", "C"=>"INR"}, 
  {"A"=>Time.parse('2019-05-20 00:00:00 +0200'), "B"=>"TEXT2", "C"=>"EUR"}
]

header_row = rows.first

mapped_header_row = header_row.inject({}) do |hash, (k, v)|
  mapped_name = mappings.find do |mapped_name, regex|
    v.match? regex
  end&.first

  # defaults to `v.to_sym` (Header Name), if not in mappings
  # you can also raise an Exception here instead if not in mappings, depending on your expectations
  hash[k] = mapped_name || v.to_sym 
  hash
end

mapped_rows = rows[1..-1].map do |row|
  new_row = {}
  row.each do |k, v|
    new_row[mapped_header_row[k]] = v
  end
  new_row
end

puts mapped_rows
# => [
#      {:date=>2019-05-16 00:00:00 +0200, :portfolio_name=>"TEXT", :currency=>"INR"},
#      {:date=>2019-05-20 00:00:00 +0200, :portfolio_name=>"TEXT2", :currency=>"EUR"}
#    ]

Given:

require 'time'

mappings = {
  date: /Date|Data|datum|Fecha/,
  portfolio_name: /Portfolio|portafoglio|Portfolioname|cartera|portefeuille/,
  currency: /Currency|Valuta|Währung|Divisa|Devise/
}

rows = [
  {"A"=>"Date", "B"=>"Portfolio", "C"=>"Currency"},
  {"A"=>Time.parse('2019-05-16 00:00:00 +0200'), "B"=>"TEXT", "C"=>"INR"}, 
  {"A"=>Time.parse('2019-05-20 00:00:00 +0200'), "B"=>"TEXT2", "C"=>"EUR"}
]

Steps:

  1. We first extract the first row, to get the column names.

    header_row = rows.first
    puts header_row
    # => {"A"=>"Date", "B"=>"Portfolio", "C"=>"Currency"}
    
  2. We need to loop through each of the Hash pairs: (key, value), and we need to find if the "value" corresponds to any of our mappings variable.

    In short for this step, we need to somehow convert (i.e.):

    header_row = {"A"=>"Date", "B"=>"Portfolio", "C"=>"Currency"}

    into

    mapped_header_row = {"A"=>"date", "B"=>"portfolio_name", "C"=>"currency"}

    And so...

    mapped_header_row = header_row.inject({}) do |hash, (k, v)|
      mapped_name = mappings.find do |mapped_name, regex|
        v.match? regex
      end&.first
    
      # defaults to `v.to_sym` (Header Name), if not in mappings
      # you can also raise an Exception here instead if not in mappings, depending on your expectations
      hash[k] = mapped_name || v.to_sym 
      hash
    end
    
    puts mapped_header_row
    # => {"A"=>"date", "B"=>"portfolio_name", "C"=>"currency"}
    

    See inject

    See find

  3. Now that we have the mapped_header_row (or the "mapped" labels / names for each column), then we can just simply update all of the "keys" of 2nd row until the last row, with the "mapped" name: the keys being "A", "B", and "C"... to be replaced correspondingly with "date", "portfolio_name", and "currency"

    # row[1..-1] means the 2nd element in the array until the last element
    mapped_rows = rows[1..-1].map do |row|
      new_row = {}
      row.each do |k, v|
        new_row[mapped_header_row[k]] = v
      end
      new_row
    end
    
    puts mapped_rows
    # => [
    #      {:date=>2019-05-16 00:00:00 +0200, :portfolio_name=>"TEXT", :currency=>"INR"},
    #      {:date=>2019-05-20 00:00:00 +0200, :portfolio_name=>"TEXT2", :currency=>"EUR"}
    #    ]
    

    See map

Upvotes: 1

Vasfed
Vasfed

Reputation: 18464

Detect column names in a separate step. Intermediate mapping will look like {"A"=>:date, "B"=>:portfolio_name, "C"=>:currency}, and then you can transform data array.

This is pretty straightforward:

header_mapping = header.transform_values{|v|
  mapping.find{|key,regex| v.match?(regex) }&.first || raise("Unknown header field #{v}")
}

rows.map{|row|
  row.transform_keys{|k| header_mapping[k].to_s }
}

Code requires Ruby 2.4+ for native Hash#transform_* or ActiveSupport

Upvotes: 4

Related Questions