Reputation: 1283
G'day guys, I'm currently using fasterCSV to parse a CSV file in ruby, and wondering how to get rid of the initial row of data on a CSV (The initial row contains the time/date information generated by another software package)
I tried using fasterCSV.table and then deleting row(0) then converting it to a CSV document then parsing it
but the row was still present in the document.
Any other ideas?
fTable = FasterCSV.table("sto.csv", :headers => true)
fTable.delete(0)
Upvotes: 2
Views: 3128
Reputation: 156
hi doing just that with some data for Australian Electoral Commission. The file in question has a date string on the first line and headers on the second
require 'csv'
require 'open-uri'
filename = "http://results.aec.gov.au/15508/Website/Downloads/SenateGroupVotingTicketsDownload-15508.csv"
file = File.open(open(filename))
first_line = file.readline
CSV.parse(file, headers: true).each do |row|
puts row["State"]
end
I presume the file I quote still exists but that can be replaced by the file in question. if you need to skip more rows you have to call file.readline that number of times.
Upvotes: 2
Reputation: 26363
You could use the :return_headers => true option to skip over the bad line. That'll work great if the second line isn't the real header. See here for more
:return_headers:
When false, header rows are silently swallowed. If set to true, header rows are returned in a FasterCSV::Row object with identical headers and fields (save that the fields do not go through the converters).
You don't need to use Ruby for this - how about chopping the file using one of the solutions suggested here you can call the one-liners from Ruby using the system method.
Have you considered reading the file directly, skipping the first line and then accepting or rejecting lines? Deep in the heart of my code is this parse method which treats the file as a series of lines, accepting or rejecting each. You could do something similar but skip over the first row.
The neat thing is that you get to determine which rows are acceptable by defining your own acceptable? method - only valid CSV data is passed to acceptable? the rest are thrown away in response to the exception.
def parse(file)
#
# Parse data
#
row = []
file.each_line do |line|
the_line = line.chomp
begin
row = FasterCSV.parse_line(the_line)
ok, message = acceptable?(row)
if not ok
reject(file.lineno, the_line, message)
else
accept(row, the_line)
end
rescue FasterCSV::MalformedCSVError => e
reject(file.lineno, the_line, e.to_s)
end
end
Upvotes: 4