Reputation: 7400
I am parsing an Excel file using Creek. This is the first row (the header):
{"A"=>"Date", "B"=>"Portfolio", "C"=>"Currency"}
and all the other rows are:
[
{"A"=>2019-05-16 00:00:00 +0200, "B"=>"TEXT", "C"=>"INR"},
{"A"=>2019-05-20 00:00:00 +0200, "B"=>"TEXT2", "C"=>"EUR"}
]
My goal is to have the same array, where all hash keys are replaced with key of mapping
using a regex expression in the values of the mapping
hash.
For example, in the header, the keys match these REGEX:
mapping = {
date: /Date|Data|datum|Fecha/,
portfolio_name: /Portfolio|portafoglio|Portfolioname|cartera|portefeuille/,
currency: /Currency|Valuta|Währung|Divisa|Devise/
}
So I need all data rows to be replaced like this:
[
{"date"=>2019-05-16 00:00:00 +0200, "portfolio_name"=>"TEXT", "currency"=>"INR"},
{"date=>2019-05-20 00:00:00 +0200, "portfolio_name"=>"TEXT2", "currency"=>"EUR"}
]
Upvotes: 1
Views: 283
Reputation: 6603
require 'time'
mappings = {
date: /Date|Data|datum|Fecha/,
portfolio_name: /Portfolio|portafoglio|Portfolioname|cartera|portefeuille/,
currency: /Currency|Valuta|Währung|Divisa|Devise/
}
rows = [
{"A"=>"Date", "B"=>"Portfolio", "C"=>"Currency"},
{"A"=>Time.parse('2019-05-16 00:00:00 +0200'), "B"=>"TEXT", "C"=>"INR"},
{"A"=>Time.parse('2019-05-20 00:00:00 +0200'), "B"=>"TEXT2", "C"=>"EUR"}
]
header_row = rows.first
mapped_header_row = header_row.inject({}) do |hash, (k, v)|
mapped_name = mappings.find do |mapped_name, regex|
v.match? regex
end&.first
# defaults to `v.to_sym` (Header Name), if not in mappings
# you can also raise an Exception here instead if not in mappings, depending on your expectations
hash[k] = mapped_name || v.to_sym
hash
end
mapped_rows = rows[1..-1].map do |row|
new_row = {}
row.each do |k, v|
new_row[mapped_header_row[k]] = v
end
new_row
end
puts mapped_rows
# => [
# {:date=>2019-05-16 00:00:00 +0200, :portfolio_name=>"TEXT", :currency=>"INR"},
# {:date=>2019-05-20 00:00:00 +0200, :portfolio_name=>"TEXT2", :currency=>"EUR"}
# ]
require 'time'
mappings = {
date: /Date|Data|datum|Fecha/,
portfolio_name: /Portfolio|portafoglio|Portfolioname|cartera|portefeuille/,
currency: /Currency|Valuta|Währung|Divisa|Devise/
}
rows = [
{"A"=>"Date", "B"=>"Portfolio", "C"=>"Currency"},
{"A"=>Time.parse('2019-05-16 00:00:00 +0200'), "B"=>"TEXT", "C"=>"INR"},
{"A"=>Time.parse('2019-05-20 00:00:00 +0200'), "B"=>"TEXT2", "C"=>"EUR"}
]
We first extract the first row, to get the column names.
header_row = rows.first
puts header_row
# => {"A"=>"Date", "B"=>"Portfolio", "C"=>"Currency"}
We need to loop through each of the Hash pairs: (key, value), and we need to find if the "value" corresponds to any of our mappings
variable.
In short for this step, we need to somehow convert (i.e.):
header_row = {"A"=>"Date", "B"=>"Portfolio", "C"=>"Currency"}
into
mapped_header_row = {"A"=>"date", "B"=>"portfolio_name", "C"=>"currency"}
And so...
mapped_header_row = header_row.inject({}) do |hash, (k, v)|
mapped_name = mappings.find do |mapped_name, regex|
v.match? regex
end&.first
# defaults to `v.to_sym` (Header Name), if not in mappings
# you can also raise an Exception here instead if not in mappings, depending on your expectations
hash[k] = mapped_name || v.to_sym
hash
end
puts mapped_header_row
# => {"A"=>"date", "B"=>"portfolio_name", "C"=>"currency"}
See inject
See find
Now that we have the mapped_header_row
(or the "mapped" labels / names for each column), then we can just simply update all of the "keys" of 2nd row until the last row, with the "mapped" name: the keys being "A", "B", and "C"... to be replaced correspondingly with "date", "portfolio_name", and "currency"
# row[1..-1] means the 2nd element in the array until the last element
mapped_rows = rows[1..-1].map do |row|
new_row = {}
row.each do |k, v|
new_row[mapped_header_row[k]] = v
end
new_row
end
puts mapped_rows
# => [
# {:date=>2019-05-16 00:00:00 +0200, :portfolio_name=>"TEXT", :currency=>"INR"},
# {:date=>2019-05-20 00:00:00 +0200, :portfolio_name=>"TEXT2", :currency=>"EUR"}
# ]
See map
Upvotes: 1
Reputation: 18464
Detect column names in a separate step. Intermediate mapping will look like {"A"=>:date, "B"=>:portfolio_name, "C"=>:currency}
, and then you can transform data array.
This is pretty straightforward:
header_mapping = header.transform_values{|v|
mapping.find{|key,regex| v.match?(regex) }&.first || raise("Unknown header field #{v}")
}
rows.map{|row|
row.transform_keys{|k| header_mapping[k].to_s }
}
Code requires Ruby 2.4+ for native Hash#transform_*
or ActiveSupport
Upvotes: 4