Angela Jonon
Angela Jonon

Reputation: 253

Selecting a single value field from CSV file

I want to extract a specific value from a single field in my CSV file, but any research I have done points towards using hashes to extract whole columns of data rather than value.

Name,Times arrived,Total $ spent,Food feedback
Dan,34,2548,Lovin it!
Maria,55,5054,"Good, delicious food"
Carlos,22,4352,"I am ""pleased"", but could be better"
Stephany,34,6542,I want bigger steaks!!!!!

For example, I wish to extract the values 2548 and 4352 to add and merge into a total for a new row.

I have used:

CSV.foreach("file.csv") { |row| col_data_new << row[5] } 

to extract values from a column into an array but this time I just want one value.

Upvotes: 0

Views: 2465

Answers (1)

the Tin Man
the Tin Man

Reputation: 160551

Yes, hashes are the way to go:

require 'csv'

data = 'Name,Times arrived,Total $ spent,Food feedback
Dan,34,2548,Lovin it!
Maria,55,5054,"Good, delicious food"
Carlos,22,4352,"I am ""pleased"", but could be better"
Stephany,34,6542,I want bigger steaks!!!!!
'

CSV.parse(data, headers: :first_row).map{ |row| row["Total $ spent"] }
# => ["2548", "5054", "4352", "6542"]

Pretend that

CSV.parse(data, headers: :first_row)

is really

CSV.foreach('some/file.csv', headers: :first_row)

and the data is really in a file.

The reason you want to use headers: :first_row is that tells CSV to gobble up the first line. Then it'll return a hash for each record, using the associated header field for the keys, making it easier to retrieve specific fields by name.

From the documentation:

:headers

If set to :first_row or true, the initial row of the CSV file will be treated as a row of headers.

Alternate ways of doing this are:

spent = CSV.parse(data).map{ |row| row[2] }
spent.shift

spent
# => ["2548", "5054", "4352", "6542"]

spent.shift drops the first element from the array, which was the header field for that column, leaving the array containing only values.

Or:

spent = []
skip_headers = true
CSV.parse(data).each do |row| 

  if skip_headers
    skip_headers = false
    next
  end

  spent << row[2]
end

spent
# => ["2548", "5054", "4352", "6542"]

Similar to the shift statement above, the next is telling Ruby to skip to the next iteration of the loop and not process the rest of the instructions in the block, which results in the header record being skipped in the final output.

Once you have the values from the fields you want you can selectively extract specific ones. If you want the values "2548" and "4352", you have to have a way of determining which rows those are in. Using arrays (the non-header method) makes it more awkward to do, so I'd do it using hashes again:

spent = CSV.parse(data, headers: :first_row).each_with_object([]) do |row, ary| 
  case row['Name']
  when 'Dan', 'Carlos'
    ary << row['Total $ spent']
  end
end

spent
# => ["2548", "4352"]

Notice that it's very clear what is going on which is important in code. Using the case and when allow me to easily add additional names to include. That acts like a chained "or" conditional test on an if statement, but without the additional noise.

each_with_object is similar to inject, except it is cleaner when we need to aggregate values into an Array, Hash or some object.

Summing the array is easy and there are many different ways to get there, but I'd use:

spent.map(&:to_i).inject(:+) # => 6900

Basically that converts the individual elements to integers and adds them together. (There's more to it but that's not important until farther up your learning curve.)


I am just wondering if it is possible to replace the contents of the 'when' condition with an array of strings to iterate over rather than hard coded strings?

Here's a solution using an Array:

NAMES = %w[Dan Carlos]

spent = CSV.parse(data, headers: :first_row).each_with_object([]) do |row, ary| 
  case row['Name']
  when *NAMES
    ary << row['Total $ spent']
  end
end

spent
# => ["2548", "4352"]

If the list of names is large I think this solution will run slower than necessary. Arrays are great for storing data you're going to get to, as a queue, or for remembering their order like a stack, but they're bad when you have to walk it just to find something. Even a sorted Array and using a binary search is likely to be slower than using a Hash because of the extra steps involved in using them. Here's an alternate way of doing this, but using a Hash:

NAMES = %w[Dan Carlos].map{ |n| [n, true] }.to_h

spent = CSV.parse(data, headers: :first_row).each_with_object([]) do |row, ary| 
  case
  when NAMES[row['Name']]
    ary << row['Total $ spent']
  end
end

spent
# => ["2548", "4352"]

But that can be refactored to be more readable:

NAMES = %w[Dan Carlos].each_with_object({}) { |a, h| h[a] = true }
# => {"Dan"=>true, "Carlos"=>true}

spent = CSV.parse(data, headers: :first_row).each_with_object([]) do |row, ary| 
  ary << row['Total $ spent'] if NAMES[row['Name']]
end

spent
# => ["2548", "4352"]

Upvotes: 1

Related Questions