Reputation: 21
I'm trying to write a Ruby program which will parse the following TSV file and loop over each record, adding each shop name (last column) as the key in a hash and the associated price (second column) as the value to each corresponding key:
White Bread £1.20 Baker
Whole Milk £0.80 Corner Shop
Gorgonzola £10.20 Cheese Shop
Mature Cheddar £5.20 Cheese Shop
Limburger £6.35 Cheese Shop
Newspaper £1.20 Corner Shop
Ilchester £3.99 Cheese Shop
So the aim is to end up with a hash with entries in the following format: shop => price.
Here's the code I've got so far:
totals = {}
File.open("shopping.tsv") do |file|
records = file.each_line.map { |line| line.chomp.split("\t") }
records.each { |_, price, shop| totals[shop.to_sym] = price }
puts(totals)
end
This produces an incorrect output with only some of the records parsed and added to the totals hash (and also some inconsistencies in the way the key symbols are presented):
{:Baker=>"£1.20", :"Corner Shop"=>"£1.20", :"Cheese Shop"=>"£3.99"}
Why is this happening? The output above gives the data in the desired format, but is missing most of the records. I'd eventually like to extend this program to provide totals for each shop by only adding a new hash entry if a given key doesn't already exist, but I'd like to get to the bottom of this issue before going any further.
I've spent a fair amount of time on print debugging and can confirm that the data is being correctly parsed by the file.each_line.map method, with each record being turned into a subarray containing the fields as expected. The problem appears to stem from the next line, which attempts to add just the shop and price fields to the hash.
I've also checked Stack Overflow and noticed that similar incorrect outputs often stem from attempting to iterate over an array whilst changing it, although that doesn't appear to be what I'm trying to do here (please correct me if I'm wrong). I've also experimented with using the duplicate method to create a copy of each subarray rather than trying to create the hash from the original data, but still get the same result.
I'd be grateful if someone could please enlighten me as to what is going on here.
Thanks in advance.
Upvotes: 1
Views: 123
Reputation: 26690
I now realize you are trying to calculate the total per-shop, so your approach is very close but you'll need to update it to increase the total per key rather than replacing it on each iteration.
Below is an example of how to get this done:
item1 = {name: "White Bread", price: 1.20, shop: "Baker" }
item2 = {name: "Whole Milk", price: 0.80, shop: "Corner Shop"}
item3 = {name: "Gorgonzola", price: 10.20, shop: "Cheese Shop"}
item4 = {name: "Mature Cheddar", price: 5.20, shop: "Cheese Shop"}
item5 = {name: "Limburger", price: 6.35, shop: "Cheese Shop"}
item6 = {name: "Newspaper", price: 1.205, shop: "Corner Shop"}
item7 = {name: "Ilchester", price: 3.99, shop: "Cheese Shop"}
list = [item1, item2, item3, item4, item5, item6, item7]
totals = {}
list.each do |item|
key = item[:shop].to_sym
if totals[key] == nil
# initialize the total for this shop
totals[key] = item[:price]
else
# increase the previous total for this shop
totals[key] = totals[key] + item[:price]
end
end
puts totals
Using your original code I think the solution would be something like this:
totals = {}
File.open("shopping.tsv") do |file|
records = file.each_line.map { |line| line.chomp.split("\t") }
records.each do |_, price, shop|
# convert the price to number (i.e. drop the £)
price_num = price[1..].to_f
if totals[shop.to_sym] == nil
# initialize the total for this shop
totals[shop.to_sym] = price_num
else
# increase the previous total for this shop
totals[shop.to_sym] = totals[shop.to_sym] + price_num
end
end
puts(totals)
end
Upvotes: 2
Reputation: 114237
To parse a delimited file, you can utilize Ruby's CSV library. It defaults to ,
as the delimiter, but you can easily specify \t
for tab:
require 'csv'
CSV.foreach("shopping.tsv", col_sep: "\t") do |row|
p product: row[0], price: row[1], shop: row[2]
end
If you prefer named references, you can also specify headers:
CSV.foreach(file, col_sep: "\t", headers: [:product, :price, :shop]) do |row|
p product: row[:product], price: row[:price], shop: row[:shop]
end
Both of the above will output:
{:product=>"White Bread", :price=>"£1.20", :shop=>"Baker"}
{:product=>"Whole Milk", :price=>"£0.80", :shop=>"Corner Shop"}
{:product=>"Gorgonzola", :price=>"£10.20", :shop=>"Cheese Shop"}
{:product=>"Mature Cheddar", :price=>"£5.20", :shop=>"Cheese Shop"}
{:product=>"Limburger", :price=>"£6.35", :shop=>"Cheese Shop"}
{:product=>"Newspaper", :price=>"£1.20", :shop=>"Corner Shop"}
{:product=>"Ilchester", :price=>"£3.99", :shop=>"Cheese Shop"}
Note that the prices are still strings. In order to calculate a sum, you have to convert them into something numerical. Since these are monetary values, I'd recommend the 3rd-party Money
gem and its Monetize
addition for parsing string values:
require 'money'
require 'monetize'
I18n.config.available_locales = :en
Money.locale_backend = :i18n
Money.default_currency = Money::Currency.new('GBP')
It allows you to parse monetary string values into Money
instances and – once parsed – perform arithmetic operations and formatting: (among many other features)
a = Monetize.parse("£1.20")
#=> #<Money fractional:120 currency:GBP>
b = Monetize.parse("£0.80")
#=> #<Money fractional:80 currency:GBP>
c = a + b
#=> #<Money fractional:200 currency:GBP>
c.format
#=> "£2.00"
Putting CSV
and Money
/Monetize
together:
totals = Hash.new(Money.zero)
CSV.foreach(file, col_sep: "\t", headers: [:product, :price, :shop]) do |row|
totals[row[:shop]] += Monetize.parse(row[:price])
end
p totals.transform_values(&:format)
Note that I'm using +=
to actually add each price to the corresponding shop's hash entry.
Output:
{"Baker"=>"£1.20", "Corner Shop"=>"£2.00", "Cheese Shop"=>"£25.74"}
You might've wondered what totals = Hash.new(Money.zero)
is for – this creates a hash with a default value of £0.00
. Having a default value for each key allows us to add the price values right-away without having to worry about the initial value being nil
.
Upvotes: 4