Reputation: 253
I want to extract a specific value from a single field in my CSV file, but any research I have done points towards using hashes to extract whole columns of data rather than value.
Name,Times arrived,Total $ spent,Food feedback
Dan,34,2548,Lovin it!
Maria,55,5054,"Good, delicious food"
Carlos,22,4352,"I am ""pleased"", but could be better"
Stephany,34,6542,I want bigger steaks!!!!!
For example, I wish to extract the values 2548 and 4352 to add and merge into a total for a new row.
I have used:
CSV.foreach("file.csv") { |row| col_data_new << row[5] }
to extract values from a column into an array but this time I just want one value.
Upvotes: 0
Views: 2465
Reputation: 160551
Yes, hashes are the way to go:
require 'csv'
data = 'Name,Times arrived,Total $ spent,Food feedback
Dan,34,2548,Lovin it!
Maria,55,5054,"Good, delicious food"
Carlos,22,4352,"I am ""pleased"", but could be better"
Stephany,34,6542,I want bigger steaks!!!!!
'
CSV.parse(data, headers: :first_row).map{ |row| row["Total $ spent"] }
# => ["2548", "5054", "4352", "6542"]
Pretend that
CSV.parse(data, headers: :first_row)
is really
CSV.foreach('some/file.csv', headers: :first_row)
and the data is really in a file.
The reason you want to use headers: :first_row
is that tells CSV to gobble up the first line. Then it'll return a hash for each record, using the associated header field for the keys, making it easier to retrieve specific fields by name.
From the documentation:
:headers
If set to :first_row or true, the initial row of the CSV file will be treated as a row of headers.
Alternate ways of doing this are:
spent = CSV.parse(data).map{ |row| row[2] }
spent.shift
spent
# => ["2548", "5054", "4352", "6542"]
spent.shift
drops the first element from the array, which was the header field for that column, leaving the array containing only values.
Or:
spent = []
skip_headers = true
CSV.parse(data).each do |row|
if skip_headers
skip_headers = false
next
end
spent << row[2]
end
spent
# => ["2548", "5054", "4352", "6542"]
Similar to the shift
statement above, the next
is telling Ruby to skip to the next iteration of the loop and not process the rest of the instructions in the block, which results in the header record being skipped in the final output.
Once you have the values from the fields you want you can selectively extract specific ones. If you want the values "2548" and "4352", you have to have a way of determining which rows those are in. Using arrays (the non-header method) makes it more awkward to do, so I'd do it using hashes again:
spent = CSV.parse(data, headers: :first_row).each_with_object([]) do |row, ary|
case row['Name']
when 'Dan', 'Carlos'
ary << row['Total $ spent']
end
end
spent
# => ["2548", "4352"]
Notice that it's very clear what is going on which is important in code. Using the case
and when
allow me to easily add additional names to include. That acts like a chained "or" conditional test on an if
statement, but without the additional noise.
each_with_object
is similar to inject
, except it is cleaner when we need to aggregate values into an Array, Hash or some object.
Summing the array is easy and there are many different ways to get there, but I'd use:
spent.map(&:to_i).inject(:+) # => 6900
Basically that converts the individual elements to integers and adds them together. (There's more to it but that's not important until farther up your learning curve.)
I am just wondering if it is possible to replace the contents of the 'when' condition with an array of strings to iterate over rather than hard coded strings?
Here's a solution using an Array:
NAMES = %w[Dan Carlos]
spent = CSV.parse(data, headers: :first_row).each_with_object([]) do |row, ary|
case row['Name']
when *NAMES
ary << row['Total $ spent']
end
end
spent
# => ["2548", "4352"]
If the list of names is large I think this solution will run slower than necessary. Arrays are great for storing data you're going to get to, as a queue, or for remembering their order like a stack, but they're bad when you have to walk it just to find something. Even a sorted Array and using a binary search is likely to be slower than using a Hash because of the extra steps involved in using them. Here's an alternate way of doing this, but using a Hash:
NAMES = %w[Dan Carlos].map{ |n| [n, true] }.to_h
spent = CSV.parse(data, headers: :first_row).each_with_object([]) do |row, ary|
case
when NAMES[row['Name']]
ary << row['Total $ spent']
end
end
spent
# => ["2548", "4352"]
But that can be refactored to be more readable:
NAMES = %w[Dan Carlos].each_with_object({}) { |a, h| h[a] = true }
# => {"Dan"=>true, "Carlos"=>true}
spent = CSV.parse(data, headers: :first_row).each_with_object([]) do |row, ary|
ary << row['Total $ spent'] if NAMES[row['Name']]
end
spent
# => ["2548", "4352"]
Upvotes: 1